scispace - formally typeset
Search or ask a question

Showing papers on "Image quality published in 2021"


Journal ArticleDOI
TL;DR: Experimental results outwards show that the intelligent module provides energy-efficient, secured transmission with low computational time as well as a reduced bit error rate, which is a key requirement considering the intelligent manufacturing of VSNs.
Abstract: Due to technology advancement, smart visual sensing required in terms of data transfer capacity, energy-efficiency, security, and computational-efficiency. The high-quality image transmission in visual sensor networks (VSNs) consumes more space, energy, transmission delay which may experience the various security threats. Image compression is a key phase of visual sensing systems that needs to be effective. This motivates us to propose a fast and efficient intelligent image transmission module to achieve the energy-efficiency, minimum delay, and bandwidth utilization. Compressive sensing (CS) introduced to speedily compressed the image to reduces the consumption of energy, time minimization, and efficient bandwidth utilization. However, CS cannot achieve security against the different kinds of threats. Several methods introduced since the last decade to address the security challenges in the CS domain, but efficiency is a key requirement considering the intelligent manufacturing of VSNs. Furthermore, the random variables selected for the CS having the problem of recovering the image quality due to the accumulation of noise. Thus concerning the above challenges, this paper introduced a novel one-way image transmission module in multiple input multiple output that provides secure and energy-efficient with the CS model. The secured transmission in the CS domain proposed using the security matrix which is called a compressed secured matrix and perfect reconstruction with the random matrix measurement in the CS. Experimental results outwards that the intelligent module provides energy-efficient, secured transmission with low computational time as well as a reduced bit error rate.

262 citations


Journal ArticleDOI
TL;DR: This paper proposed two-way image transmission to the Corvus Coron module, which presents an energy-effective with the CS model, as an inbuilt interaction in the CS transmission through the security framework, which results in energy-efficient and conserved transmission in the form of low error rate with low computational time.
Abstract: Two-way image communication in a wireless channel needs to be viable with channel properties such as transfer speed, energy-effective, time usage, and security because image capability consumes a huge space in the gadget and is quite effective. Is required in a manner. The figure goes through attacks. In addition, the quiesical issue for additional time of pressure is that the auxiliary interaction of pressure occurs through the dewar receiving extra time. To address these issues, compressed sensing emerges, which packs the image into hours of sensing, is generated in an expedient manner that reduces time usage and saves the use of data transfer capability, however Bomb in transmission. A variety of examinations cleared a way for dealing with security issues in compressive sensing (CS) through giving security as an alternative negotiation. In addition, univariate factors opted for CS as the issue of rearranging image quality is because of the aggregation of clutter. Along these lines related to the above issues, this paper proposed two-way image transmission to the Corvus Coron module, which presents an energy-effective with the CS model, as an inbuilt interaction in the CS transmission through the security framework. Receives what was designated as the pack-protected plot. Impeccable entertainment with the famous arbitrary network conjecture in CS. The result of the test is that the practical module presents energy-efficient and conserved transmission in the form of low error rate with low computational time.

230 citations


Journal ArticleDOI
TL;DR: This research concludes that SSIM is a better measure of imperceptibility in all aspects and it is preferable that in the next steganographic research at least use SSIM.
Abstract: Peak signal to noise ratio (PSNR) and structural index similarity (SSIM) are two measuring tools that are widely used in image quality assessment. Especially in the steganography image, these two measuring instruments are used to measure the quality of imperceptibility. PSNR is used earlier than SSIM, is easy, has been widely used in various digital image measurements, and has been considered tested and valid. SSIM is a newer measurement tool that is designed based on three factors i.e. luminance, contrast, and structure to better suit the workings of the human visual system. Some research has discussed the correlation and comparison of these two measuring tools, but no research explicitly discusses and suggests which measurement tool is more suitable for steganography. This study aims to review, prove, and analyze the results of PSNR and SSIM measurements on three spatial domain image steganography methods, i.e. LSB, PVD, and CRT. Color images were chosen as container images because human vision is more sensitive to color changes than grayscale changes. Based on the test results found several opposing findings, where LSB has the most superior value based on PSNR and PVD get the most superior value based on SSIM. Additionally, the changes based on the histogram are more noticeable in LSB and CRT than in PVD. Other analyzes such as RS attack also show results that are more in line with SSIM measurements when compared to PSNR. Based on the results of testing and analysis, this research concludes that SSIM is a better measure of imperceptibility in all aspects and it is preferable that in the next steganographic research at least use SSIM.

204 citations


Proceedings ArticleDOI
Han Zhang1, Jing Yu Koh1, Jason Baldridge1, Honglak Lee2, Yinfei Yang1 
20 Jun 2021
TL;DR: XMC-GAN as mentioned in this paper uses an attentional self-modulation generator, which enforces strong text-image correspondence, and a contrastive discriminator, which acts as a critic as well as a feature encoder for contrastive learning.
Abstract: The output of text-to-image synthesis systems should be coherent, clear, photo-realistic scenes with high semantic fidelity to their conditioned text descriptions. Our Cross-Modal Contrastive Generative Adversarial Network (XMC-GAN) addresses this challenge by maximizing the mutual information between image and text. It does this via multiple contrastive losses which capture inter-modality and intra-modality correspondences. XMC-GAN uses an attentional self-modulation generator, which enforces strong text-image correspondence, and a contrastive discriminator, which acts as a critic as well as a feature encoder for contrastive learning. The quality of XMC-GAN’s output is a major step up from previous models, as we show on three challenging datasets. On MS-COCO, not only does XMC-GAN improve state-of-the-art FID from 24.70 to 9.33, but– more importantly–people prefer XMC-GAN by 77.3% for image quality and 74.1% for image-text alignment, compared to three other recent models. XMC-GAN also generalizes to the challenging Localized Narratives dataset (which has longer, more detailed descriptions), improving state-of-the-art FID from 48.70 to 14.12. Lastly, we train and evaluate XMC-GAN on the challenging Open Images data, establishing a strong benchmark FID score of 26.91.

195 citations


Journal ArticleDOI
TL;DR: This work proposes a novel recurrent network to reconstruct videos from a stream of events, and trains it on a large amount of simulated event data, and shows that off-the-shelf computer vision algorithms can be applied to the reconstructions and that this strategy consistently outperforms algorithms that were specifically designed for event data.
Abstract: Event cameras are novel sensors that report brightness changes in the form of a stream of asynchronous “events” instead of intensity frames. They offer significant advantages with respect to conventional cameras: high temporal resolution, high dynamic range, and no motion blur. While the stream of events encodes in principle the complete visual signal, the reconstruction of an intensity image from a stream of events is an ill-posed problem in practice. Existing reconstruction approaches are based on hand-crafted priors and strong assumptions about the imaging process as well as the statistics of natural images. In this work we propose to learn to reconstruct intensity images from event streams directly from data instead of relying on any hand-crafted priors. We propose a novel recurrent network to reconstruct videos from a stream of events, and train it on a large amount of simulated event data. During training we propose to use a perceptual loss to encourage reconstructions to follow natural image statistics. We further extend our approach to synthesize color images from color event streams. Our quantitative experiments show that our network surpasses state-of-the-art reconstruction methods by a large margin in terms of image quality ( $>\!20\%$ > 20 % ), while comfortably running in real-time. We show that the network is able to synthesize high framerate videos ( $>5,000$ > 5 , 000 frames per second) of high-speed phenomena (e.g., a bullet hitting an object) and is able to provide high dynamic range reconstructions in challenging lighting conditions. As an additional contribution, we demonstrate the effectiveness of our reconstructions as an intermediate representation for event data. We show that off-the-shelf computer vision algorithms can be applied to our reconstructions for tasks such as object classification and visual-inertial odometry and that this strategy consistently outperforms algorithms that were specifically designed for event data. We release the reconstruction code, a pre-trained model and the datasets to enable further research.

164 citations


Proceedings ArticleDOI
01 Jun 2021
TL;DR: The Periodic Implicit Generative Adversarial Networks (π-GAN) as discussed by the authors leverages neural representations with periodic activation functions and volumetric rendering to represent scenes as view-consistent radiance fields.
Abstract: We have witnessed rapid progress on 3D-aware image synthesis, leveraging recent advances in generative visual models and neural rendering. Existing approaches how-ever fall short in two ways: first, they may lack an under-lying 3D representation or rely on view-inconsistent rendering, hence synthesizing images that are not multi-view consistent; second, they often depend upon representation network architectures that are not expressive enough, and their results thus lack in image quality. We propose a novel generative model, named Periodic Implicit Generative Adversarial Networks (π-GAN or pi-GAN), for high-quality 3D-aware image synthesis. π-GAN leverages neural representations with periodic activation functions and volumetric rendering to represent scenes as view-consistent radiance fields. The proposed approach obtains state-of-the-art results for 3D-aware image synthesis with multiple real and synthetic datasets.

146 citations


Journal ArticleDOI
TL;DR: In this paper, a deep Fourier channel attention network (DFCAN) was proposed to learn hierarchical representations of high-frequency information about diverse biological structures using multimodal structured illumination microscopy (SIM).
Abstract: Deep neural networks have enabled astonishing transformations from low-resolution (LR) to super-resolved images However, whether, and under what imaging conditions, such deep-learning models outperform super-resolution (SR) microscopy is poorly explored Here, using multimodality structured illumination microscopy (SIM), we first provide an extensive dataset of LR-SR image pairs and evaluate the deep-learning SR models in terms of structural complexity, signal-to-noise ratio and upscaling factor Second, we devise the deep Fourier channel attention network (DFCAN), which leverages the frequency content difference across distinct features to learn precise hierarchical representations of high-frequency information about diverse biological structures Third, we show that DFCAN's Fourier domain focalization enables robust reconstruction of SIM images under low signal-to-noise ratio conditions We demonstrate that DFCAN achieves comparable image quality to SIM over a tenfold longer duration in multicolor live-cell imaging experiments, which reveal the detailed structures of mitochondrial cristae and nucleoids and the interaction dynamics of organelles and cytoskeleton

132 citations


Journal ArticleDOI
TL;DR: A novel dual-attention denoising network is proposed that combines two parallel branches to process the spatial and spectral information separately and proves the superiority of the method both visually and quantitatively when compared with state-of-the-art methods.
Abstract: Hyperspectral image (HSI) denoising plays an important role in image quality improvement and related applications. Convolutional neural network (CNN)-based image denoising methods have been predominant due to advances made in the field of deep learning in recent years. Spatial and spectral information are crucial to HIS denoising, along with their correlations. However, existing methods fail to consider the global dependence and correlation between spatial and spectral information. Accordingly, in this article, we propose a novel dual-attention denoising network to overcome these limitations. We design two parallel branches to process the spatial and spectral information separately. The position attention module is applied to the spatial branch to formulate the interdependencies on the feature map, while the channel attention module is applied to the spectral branch to simulate the spectral correlation before the two branches are combined. A multiscale structure is also employed to extract and fuse the multiscale features following the fusion of spatial and spectral information. Experimental results on simulated and real data substantiate the superiority of our method both visually and quantitatively when compared with state-of-the-art methods.

114 citations


Journal ArticleDOI
TL;DR: A novel method to significantly enhance the transformation-based compression standards like JPEG by transmitting much fewer data of one image at the sender's end and a two-step method by combining the state-of-the-art signal processing based recovery method with a deep residual learning model to recover the original data is proposed.
Abstract: With the development of big data and network technology, there are more use cases, such as edge computing, that require more secure and efficient multimedia big data transmission. Data compression methods can help achieving many tasks like providing data integrity, protection, as well as efficient transmission. Classical multimedia big data compression relies on methods like the spatial-frequency transformation for compressing with loss. Recent approaches use deep learning to further explore the limit of the data compression methods in communication constrained use cases like the Internet of Things (IoT). In this article, we propose a novel method to significantly enhance the transformation-based compression standards like JPEG by transmitting much fewer data of one image at the sender's end. At the receiver's end, we propose a two-step method by combining the state-of-the-art signal processing based recovery method with a deep residual learning model to recover the original data. Therefore, in the IoT use cases, the sender like edge device can transmit only 60% data of the original JPEG image without any additional calculation steps but the image quality can still be recovered at the receiver's end like cloud servers with peak signal-to-noise ratio over 31 dB.

104 citations


Journal ArticleDOI
TL;DR: In this article, a multi-modality algorithm for medical image fusion based on the Adolescent Identity Search Algorithm (AISA) for the Non-Subsampled Shearlet Transform is proposed to obtain image optimization and to reduce the computational cost and time.

103 citations


Proceedings ArticleDOI
22 Mar 2021
TL;DR: In this article, the authors adopt high-capacity neural scene representations with periodic activations for jointly optimizing an implicit surface and a radiance field of a scene supervised exclusively with posed 2D images.
Abstract: Novel view synthesis is a challenging and ill-posed inverse rendering problem. Neural rendering techniques have recently achieved photorealistic image quality for this task. State-of-the-art (SOTA) neural volume rendering approaches, however, are slow to train and require minutes of inference (i.e., rendering) time for high image resolutions. We adopt high-capacity neural scene representations with periodic activations for jointly optimizing an implicit surface and a radiance field of a scene supervised exclusively with posed 2D images. Our neural rendering pipeline accelerates SOTA neural volume rendering by about two orders of magnitude and our implicit surface representation is unique in allowing us to export a mesh with view-dependent texture information. Thus, like other implicit surface representations, ours is compatible with traditional graphics pipelines, enabling real-time rendering rates, while achieving unprecedented image quality compared to other surface methods. We assess the quality of our approach using existing datasets as well as high-quality 3D face data captured with a custom multi-camera rig.

Journal ArticleDOI
TL;DR: An overview of data processing, reconstruction methods and metrics of imaging performance; outline clinical applications; and discuss potential future developments in the field of photon-counting CT are given.
Abstract: The introduction of photon-counting detectors is expected to be the next major breakthrough in clinical x-ray computed tomography (CT). During the last decade, there has been considerable research activity in the field of photon-counting CT, in terms of both hardware development and theoretical understanding of the factors affecting image quality. In this article, we review the recent progress in this field with the intent of highlighting the relationship between detector design considerations and the resulting image quality. We discuss detector design choices such as converter material, pixel size, and readout electronics design, and then elucidate their impact on detector performance in terms of dose efficiency, spatial resolution, and energy resolution. Furthermore, we give an overview of data processing, reconstruction methods and metrics of imaging performance; outline clinical applications; and discuss potential future developments.

Journal ArticleDOI
TL;DR: In this paper, the performance of the new long axial field-of-view (LAFOV) Biograph Vision Quadra PET/CT and a standard SAFOV (SAFOV-Biograph Vision 600) system using an intra-patient comparison was investigated.
Abstract: To investigate the performance of the new long axial field-of-view (LAFOV) Biograph Vision Quadra PET/CT and a standard axial field-of-view (SAFOV) Biograph Vision 600 PET/CT (both: Siemens Healthineers) system using an intra-patient comparison. Forty-four patients undergoing routine oncological PET/CT were prospectively included and underwent a same-day dual-scanning protocol following a single administration of either 18F-FDG (n = 20), 18F-PSMA-1007 (n = 16) or 68Ga-DOTA-TOC (n = 8). Half the patients first received a clinically routine examination on the SAFOV (FOVaxial 26.3 cm) in continuous bed motion and then immediately afterwards on the LAFOV system (10-min acquisition in list mode, FOVaxial 106 cm); the second half underwent scanning in the reverse order. Comparisons between the LAFOV at different emulated scan times (by rebinning list mode data) and the SAFOV were made for target lesion integral activity, signal to noise (SNR), target lesion to background ratio (TBR) and visual image quality. Equivalent target lesion integral activity to the SAFOV acquisitions (16-min duration for a 106 cm FOV) were obtained on the LAFOV in 1.63 ± 0.19 min (mean ± standard error). Equivalent SNR was obtained by 1.82 ± 1.00 min LAFOV acquisitions. No statistically significant differences (p > 0.05) in TBR were observed even for 0.5 min LAFOV examinations. Subjective image quality rated by two physicians confirmed the 10 min LAFOV to be of the highest quality, with equivalence between the LAFOV and the SAFOV at 1.8 ± 0.85 min. By analogy, if the LAFOV scans were maintained at 10 min, proportional reductions in applied radiopharmaceutical could obtain equivalent lesion integral activity for activities under 40 MBq and equivalent doses for the PET component of <1 mSv. Improved image quality, lesion quantification and SNR resulting from higher sensitivity were demonstrated for an LAFOV system in a head-to-head comparison under clinical conditions. The LAFOV system could deliver images of comparable quality and lesion quantification in under 2 min, compared to routine SAFOV acquisition (16 min for equivalent FOV coverage). Alternatively, the LAFOV system could allow for low-dose examination protocols. Shorter LAFOV acquisitions (0.5 min), while of lower visual quality and SNR, were of adequate quality with respect to target lesion identification, suggesting that ultra-fast or low-dose acquisitions can be acceptable in selected settings.

Proceedings ArticleDOI
19 Sep 2021
TL;DR: This paper proposes an architecture of using a shallow Transformer encoder on the top of a feature map extracted by convolution neural networks (CNN), and finds that the proposed TRIQ architecture achieves outstanding performance.
Abstract: Transformer has become the new standard method in natural language processing (NLP), and it also attracts research interests in computer vision area. In this paper we investigate the application of Transformer in Image Quality (TRIQ) assessment. Following the original Transformer encoder employed in Vision Transformer (ViT), we propose an architecture of using a shallow Transformer encoder on the top of a feature map extracted by convolution neural networks (CNN). Adaptive positional embedding is employed in the Transformer encoder to handle images with arbitrary resolutions. Different settings of Transformer architectures have been investigated on publicly available image quality databases. We have found that the proposed TRIQ architecture achieves outstanding performance. The implementation of TRIQ is published on Github (this https URL).

Journal ArticleDOI
TL;DR: In this paper, a unified blind image quality assessment (BIQA) model was developed and an approach of training it for both synthetic and realistic distortions was proposed to confront the cross-distortion-scenario challenge.
Abstract: Performance of blind image quality assessment (BIQA) models has been significantly boosted by end-to-end optimization of feature engineering and quality regression. Nevertheless, due to the distributional shift between images simulated in the laboratory and captured in the wild, models trained on databases with synthetic distortions remain particularly weak at handling realistic distortions (and vice versa). To confront the cross-distortion-scenario challenge, we develop a unified BIQA model and an approach of training it for both synthetic and realistic distortions. We first sample pairs of images from individual IQA databases, and compute a probability that the first image of each pair is of higher quality. We then employ the fidelity loss to optimize a deep neural network for BIQA over a large number of such image pairs. We also explicitly enforce a hinge constraint to regularize uncertainty estimation during optimization. Extensive experiments on six IQA databases show the promise of the learned method in blindly assessing image quality in the laboratory and wild. In addition, we demonstrate the universality of the proposed training strategy by using it to improve existing BIQA models.

Journal ArticleDOI
TL;DR: The proposed method, so-called a deep stacked Laplacian restorer (DSLR), is capable of separately recovering the global illumination and local details from the original input, and progressively combining them in the image space, and outperforms state-of-the-art methods.
Abstract: Various images captured in complicated lighting conditions often suffer from deterioration of the image quality. Such poor quality not only dissatisfies the user expectation but also may lead to a significant performance drop in many applications. In this paper, anovel method for low-light image enhancement is proposed by leveraging useful propertiesof the Laplacian pyramid both in image and feature spaces. Specifically, the proposed method, so-called a deep stacked Laplacian restorer (DSLR), is capable of separately recovering the global illumination and local details from the original input, and progressively combining them in the image space. Moreover, the Laplacian pyramid defined in the feature space makes such recovering processes more efficient based on abundant connectionsof higher-order residuals in a multiscale structure. This decomposition-based scheme is fairly desirable for learning the highly nonlinear relation between degraded images and their enhanced results. Experimental results on various datasets demonstrate that the proposed DSLR outperforms state-of-the-art methods. The code and model are publicly available at: https://github.com/SeokjaeLIM/DSLR-release .

Journal ArticleDOI
TL;DR: This article is an introductory overview aimed at clinical radiologists with no experience in deep‐learning‐based MR image reconstruction and should enable them to understand the basic concepts and current clinical applications of this rapidly growing area of research across multiple organ systems.
Abstract: Artificial intelligence (AI) shows tremendous promise in the field of medical imaging, with recent breakthroughs applying deep-learning models for data acquisition, classification problems, segmentation, image synthesis, and image reconstruction. With an eye towards clinical applications, we summarize the active field of deep-learning-based MR image reconstruction. We review the basic concepts of how deep-learning algorithms aid in the transformation of raw k-space data to image data, and specifically examine accelerated imaging and artifact suppression. Recent efforts in these areas show that deep-learning-based algorithms can match and, in some cases, eclipse conventional reconstruction methods in terms of image quality and computational efficiency across a host of clinical imaging applications, including musculoskeletal, abdominal, cardiac, and brain imaging. This article is an introductory overview aimed at clinical radiologists with no experience in deep-learning-based MR image reconstruction and should enable them to understand the basic concepts and current clinical applications of this rapidly growing area of research across multiple organ systems.

Proceedings ArticleDOI
19 Jun 2021
TL;DR: Zhang et al. as discussed by the authors proposed an image quality transformer (IQT) that successfully applies a transformer architecture to a perceptual full-reference image quality assessment (IQA) task.
Abstract: In this paper, we propose an image quality transformer (IQT) that successfully applies a transformer architecture to a perceptual full-reference image quality assessment (IQA) task. Perceptual representation becomes more important in image quality assessment. In this context, we extract the perceptual feature representations from each of input images using a convolutional neural network (CNN) back-bone. The extracted feature maps are fed into the transformer encoder and decoder in order to compare a reference and distorted images. Following an approach of the transformer-based vision models [18], [55], we use extra learnable quality embedding and position embedding. The output of the transformer is passed to a prediction head in order to predict a final quality score. The experimental results show that our proposed model has an outstanding performance for the standard IQA datasets. For a large-scale IQA dataset containing output images of generative model, our model also shows the promising results. The proposed IQT was ranked first among 13 participants in the NTIRE 2021 perceptual image quality assessment challenge [23]. Our work will be an opportunity to further expand the approach for the perceptual IQA task.

Journal ArticleDOI
TL;DR: In this article, a review of state-of-the-art image fusion methods of diverse levels with their pros and cons, various spatial and transform based method with quality metrics and their applications in different domains have been discussed.
Abstract: The necessity of image fusion is growing in recently in image processing applications due to the tremendous amount of acquisition systems. Fusion of images is defined as an alignment of noteworthy Information from diverse sensors using various mathematical models to generate a single compound image. The fusion of images is used for integrating the complementary multi-temporal, multi-view and multi-sensor Information into a single image with improved image quality and by keeping the integrity of important features. It is considered as a vital pre-processing phase for several applications such as robot vision, aerial, satellite imaging, medical imaging, and a robot or vehicle guidance. In this paper, various state-of-art image fusion methods of diverse levels with their pros and cons, various spatial and transform based method with quality metrics and their applications in different domains have been discussed. Finally, this review has concluded various future directions for different applications of image fusion.

Journal ArticleDOI
TL;DR: In this article, the importance of the f -number and speed of sound on image quality was discussed and a solution to set their values from a physical viewpoint was proposed, where the authors suggest determining the f-number from the directivity of the transducer elements and the speed-of-sound from the phase dispersion of the delayed signals.

Journal ArticleDOI
TL;DR: In this article, the authors discuss the foremost applications of artificial intelligence (AI), particularly deep learning (DL) algorithms, in single-photon emission computed tomography (SPECT) and positron emission tomography(PET) imaging.

Journal ArticleDOI
TL;DR: The proposed metric has demonstrated the state-of-the-art performance for predicting the subjective point cloud quality compared with multiple full-reference and no-reference models, e.g., the weighted peak signal-to-noise ratio (PSNR), structural similarity (SSIM), feature similarity (FSIM) and natural image quality evaluator (NIQE).
Abstract: Point cloud is emerged as a promising media format to represent realistic 3D objects or scenes in applications, such as virtual reality, teleportation, etc. How to accurately quantify the subjective point cloud quality for application-driven optimization, however, is still a challenging and open problem. In this paper, we attempt to tackle this problem in a systematic means. First, we produce a fairly large point cloud dataset where ten popular point clouds are augmented with seven types of impairments (e.g., compression, photometry/color noise, geometry noise, scaling) at six different distortion levels, and organize a formal subjective assessment with tens of subjects to collect mean opinion scores (MOS) for all 420 processed point cloud samples (PPCS). We then try to develop an objective metric that can accurately estimate the subjective quality. Towards this goal, we choose to project the 3D point cloud onto six perpendicular image planes of a cube for the color texture image and corresponding depth image, and aggregate image-based global (e.g., Jensen-Shannon (JS) divergence) and local features (e.g., edge, depth, pixel-wise similarity, complexity) among all projected planes for a final objective index. Model parameters are fixed constants after performing the regression using a small and independent dataset previously published. The proposed metric has demonstrated the state-of-the-art performance for predicting the subjective point cloud quality compared with multiple full-reference and no-reference models, e.g., the weighted peak signal-to-noise ratio (PSNR), structural similarity (SSIM), feature similarity (FSIM) and natural image quality evaluator (NIQE). The dataset is made publicly accessible at http://smt.sjtu.edu.cn or http://vision.nju.edu.cn for all interested audiences.

Proceedings ArticleDOI
08 Mar 2021
TL;DR: Zhang et al. as discussed by the authors proposed a teacher-student knowledge distillation approach to reduce the dependency of human parsing, where the try-on images produced by a parser-based method are used as supervisions to train a student without relying on segmentation.
Abstract: Image virtual try-on aims to fit a garment image (target clothes) to a person image. Prior methods are heavily based on human parsing. However, slightly-wrong segmentation results would lead to unrealistic try-on images with large artifacts. A recent pioneering work employed knowledge distillation to reduce the dependency of human parsing, where the try-on images produced by a parser-based method are used as supervisions to train a "student" network without relying on segmentation, making the student mimic the try-on ability of the parser-based model. However, the image quality of the student is bounded by the parser-based model. To address this problem, we propose a novel approach, "teacher-tutor-student" knowledge distillation, which is able to produce highly photo-realistic images without human parsing, possessing several appealing advantages compared to prior arts. (1) Unlike existing work, our approach treats the fake images produced by the parser-based method as "tutor knowledge", where the artifacts can be corrected by real "teacher knowledge", which is extracted from the real person images in a self-supervised way. (2) Other than using real images as supervisions, we formulate knowledge distillation in the try-on problem as distilling the appearance flows between the person image and the garment image, enabling us to find accurate dense correspondences between them to produce high-quality results. (3) Extensive evaluations show large superiority of our method (see Fig. 1).

Journal ArticleDOI
TL;DR: DLIR significantly reduced the image noise in chest LDCT scan images compared with ASiR-V 30% while maintaining superior image quality.
Abstract: OBJECTIVE Iterative reconstruction degrades image quality. Thus, further advances in image reconstruction are necessary to overcome some limitations of this technique in low-dose computed tomography (LDCT) scan of the chest. Deep-learning image reconstruction (DLIR) is a new method used to reduce dose while maintaining image quality. The purposes of this study was to evaluate image quality and noise of LDCT scan images reconstructed with DLIR and compare with those of images reconstructed with the adaptive statistical iterative reconstruction-Veo at a level of 30% (ASiR-V 30%). MATERIALS AND METHODS This retrospective study included 58 patients who underwent LDCT scan for lung cancer screening. Datasets were reconstructed with ASiR-V 30% and DLIR at medium and high levels (DLIR-M and DLIR-H, respectively). The objective image signal and noise, which represented mean attenuation value and standard deviation in Hounsfield units for the lungs, mediastinum, liver, and background air, and subjective image contrast, image noise, and conspicuity of structures were evaluated. The differences between CT scan images subjected to ASiR-V 30%, DLIR-M, and DLIR-H were evaluated. RESULTS Based on the objective analysis, the image signals did not significantly differ among ASiR-V 30%, DLIR-M, and DLIR-H (p = 0.949, 0.737, 0.366, and 0.358 in the lungs, mediastinum, liver, and background air, respectively). However, the noise was significantly lower in DLIR-M and DLIR-H than in ASiR-V 30% (all p < 0.001). DLIR had higher signal-to-noise ratio (SNR) and contrast-to-noise ratio (CNR) than ASiR-V 30% (p = 0.027, < 0.001, and < 0.001 in the SNR of the lungs, mediastinum, and liver, respectively; all p < 0.001 in the CNR). According to the subjective analysis, DLIR had higher image contrast and lower image noise than ASiR-V 30% (all p < 0.001). DLIR was superior to ASiR-V 30% in identifying the pulmonary arteries and veins, trachea and bronchi, lymph nodes, and pleura and pericardium (all p < 0.001). CONCLUSION DLIR significantly reduced the image noise in chest LDCT scan images compared with ASiR-V 30% while maintaining superior image quality.

Journal ArticleDOI
TL;DR: In this paper, the authors describe how the quality of synthetic pictures created by DCGAN, LSGAN, and WGAN is determined and combine synthetic images with original images to enhance datasets and verify the effectiveness of synthetic datasets.
Abstract: Convolutional Neural Networks (CNN) achieves perfection in traffic sign identification with enough annotated training data. The dataset determines the quality of the complete visual system based on CNN. Unfortunately, databases for traffic signs from the majority of the world’s nations are few. In this scenario, Generative Adversarial Networks (GAN) may be employed to produce more realistic and varied training pictures to supplement the actual arrangement of images. The purpose of this research is to describe how the quality of synthetic pictures created by DCGAN, LSGAN, and WGAN is determined. Our work combines synthetic images with original images to enhance datasets and verify the effectiveness of synthetic datasets. We use different numbers and sizes of images for training. Likewise, the Structural Similarity Index (SSIM) and Mean Square Error (MSE) were employed to assess picture quality. Our study quantifies the SSIM difference between the synthetic and actual images. When additional images are used for training, the synthetic image exhibits a high degree of resemblance to the genuine image. The highest SSIM value was achieved when using 200 total images as input and $32\times 32$ image size. Further, we augment the original picture dataset with synthetic pictures and compare the original image model to the synthesis image model. For this experiment, we are using the latest iterations of Yolo, Yolo V3, and Yolo V4. After mixing the real image with the synthesized image produced by LSGAN, the recognition performance has been improved, achieving an accuracy of 84.9% on Yolo V3 and an accuracy of 89.33% on Yolo V4.

Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed a dual-domain residual-based optimization (DRONE) network, which consists of three modules respectively for embedding, refinement, and awareness, and the results from the embedding and refinement modules in the data and image domains are regularized for optimized image quality in the awareness module.
Abstract: Deep learning has attracted rapidly increasing attention in the field of tomographic image reconstruction, especially for CT, MRI, PET/SPECT, ultrasound and optical imaging. Among various topics, sparse-view CT remains a challenge which targets a decent image reconstruction from very few projections. To address this challenge, in this article we propose a Dual-domain Residual-based Optimization NEtwork (DRONE). DRONE consists of three modules respectively for embedding, refinement, and awareness. In the embedding module, a sparse sinogram is first extended. Then, sparse-view artifacts are effectively suppressed in the image domain. After that, the refinement module recovers image details in the residual data and image domains synergistically. Finally, the results from the embedding and refinement modules in the data and image domains are regularized for optimized image quality in the awareness module, which ensures the consistency between measurements and images with the kernel awareness of compressed sensing. The DRONE network is trained, validated, and tested on preclinical and clinical datasets, demonstrating its merits in edge preservation, feature recovery, and reconstruction accuracy.

Journal ArticleDOI
TL;DR: An approach SMORE1 based on convolutional neural networks (CNNs) that restores image quality by improving resolution and reducing aliasing in MR images is presented and is shown to be visually and quantitatively superior to previously reported methods.
Abstract: High resolution magnetic resonance (MR) images are desired in many clinical and research applications. Acquiring such images with high signal-to-noise (SNR), however, can require a long scan duration, which is difficult for patient comfort, is more costly, and makes the images susceptible to motion artifacts. A very common practical compromise for both 2D and 3D MR imaging protocols is to acquire volumetric MR images with high in-plane resolution, but lower through-plane resolution. In addition to having poor resolution in one orientation, 2D MRI acquisitions will also have aliasing artifacts, which further degrade the appearance of these images. This paper presents an approach SMORE 1 based on convolutional neural networks (CNNs) that restores image quality by improving resolution and reducing aliasing in MR images. 2 This approach is self-supervised, which requires no external training data because the high-resolution and low-resolution data that are present in the image itself are used for training. For 3D MRI, the method consists of only one self-supervised super-resolution (SSR) deep CNN that is trained from the volumetric image data. For 2D MRI, there is a self-supervised anti-aliasing (SAA) deep CNN that precedes the SSR CNN, also trained from the volumetric image data. Both methods were evaluated on a broad collection of MR data, including filtered and downsampled images so that quantitative metrics could be computed and compared, and actual acquired low resolution images for which visual and sharpness measures could be computed and compared. The super-resolution method is shown to be visually and quantitatively superior to previously reported methods.

Journal ArticleDOI
TL;DR: In this article, a neural nano-optic image reconstruction method was proposed to learn a metasurface physical structure in conjunction with a neural feature-based image reconstruction algorithm.
Abstract: Nano-optic imagers that modulate light at sub-wavelength scales could enable new applications in diverse domains ranging from robotics to medicine. Although metasurface optics offer a path to such ultra-small imagers, existing methods have achieved image quality far worse than bulky refractive alternatives, fundamentally limited by aberrations at large apertures and low f-numbers. In this work, we close this performance gap by introducing a neural nano-optics imager. We devise a fully differentiable learning framework that learns a metasurface physical structure in conjunction with a neural feature-based image reconstruction algorithm. Experimentally validating the proposed method, we achieve an order of magnitude lower reconstruction error than existing approaches. As such, we present a high-quality, nano-optic imager that combines the widest field-of-view for full-color metasurface operation while simultaneously achieving the largest demonstrated aperture of 0.5 mm at an f-number of 2.

Journal ArticleDOI
TL;DR: The high-strength “TrueFidelity” approach generated the best image quality among the examined image reconstruction procedures, and TF-H was the most balanced image in terms of image noise and sharpness among the examining image combinations.
Abstract: To compare image noise and sharpness of vessels, liver, and muscle in lower extremity CT angiography between “adaptive statistical iterative reconstruction-V” (ASIR-V) and deep learning reconstruction “TrueFidelity” (TFI). Thirty-seven patients (mean age, 65.2 years; 32 men) with lower extremity CT angiography were enrolled between November and December 2019. Images were reconstructed with two ASIR-V (blending factor of 80% and 100% (AV-100)) and three TFI (low-, medium-, and high-strength-level (TF-H) settings). Two radiologists evaluated these images for vessels (aorta, femoral artery, and popliteal artery), liver, and psoas muscle. For quantitative analyses, conventional indicators (CT number, image noise, signal-to-noise ratio (SNR), and contrast-to-noise ratio (CNR)) and blur metric values (indicating the degree of image sharpness) of selected regions of interest were determined. For qualitative analyses, the degrees of quantum mottle and blurring were assessed. The higher the blending factor in ASIR-V or the strength in TFI, the lower the noise, the higher the SNR and CNR values, and the higher the blur metric values in all structures. The SNR and CNR values of TF-H images were significantly higher than those of AV-80 images and similar to those of AV-100 images. The blur metric values in TFI images were significantly lower than those in ASIR-V images (p < 0.001), indicating increased sharpness. Among all the investigated image procedures, the overall qualitative image quality was best in TF-H images. TF-H was the most balanced image in terms of image noise and sharpness among the examined image combinations. • Deep learning image reconstruction “TrueFidelity” is superior to iterative reconstruction “ASIR-V” regarding image noise and sharpness. • The high-strength “TrueFidelity” approach generated the best image quality among the examined image reconstruction procedures. • In iterative and deep learning CT image reconstruction, the higher the blending and strength factors, the lower the image noise and the poorer the image sharpness.

Journal ArticleDOI
TL;DR: In this article, the authors proposed a methodology to eliminate unnecessary reflectance properties of the images using a novel image processing schema and a stacked deep learning technique for the diagnosis of diabetic retinopathy.
Abstract: Diabetic retinopathy (DR) is a diabetes complication that affects the eye and can cause damage from mild vision problems to complete blindness. It has been observed that the eye fundus images show various kinds of color aberrations and irrelevant illuminations, which degrade the diagnostic analysis and may hinder the results. In this research, we present a methodology to eliminate these unnecessary reflectance properties of the images using a novel image processing schema and a stacked deep learning technique for the diagnosis. For the luminosity normalization of the image, the gray world color constancy algorithm is implemented which does image desaturation and improves the overall image quality. The effectiveness of the proposed image enhancement technique is evaluated based on the peak signal to noise ratio (PSNR) and mean squared error (MSE) of the normalized image. To develop a deep learning based computer-aided diagnostic system, we present a novel methodology of stacked generalization of convolution neural networks (CNN). Three custom CNN model weights are fed on the top of a single meta-learner classifier, which combines the most optimum weights of the three sub-neural networks to obtain superior metrics of evaluation and robust prediction results. The proposed stacked model reports an overall test accuracy of 97.92% (binary classification) and 87.45% (multi-class classification). Extensive experimental results in terms of accuracy, F-measure, sensitivity, specificity, recall and precision reveal that the proposed methodology of illumination normalization greatly facilitated the deep learning model and yields better results than various state-of-art techniques.