scispace - formally typeset
Search or ask a question

Showing papers on "Image processing published in 2020"


Posted Content
TL;DR: A comprehensive review of recent pioneering efforts in semantic and instance segmentation, including convolutional pixel-labeling networks, encoder-decoder architectures, multiscale and pyramid-based approaches, recurrent networks, visual attention models, and generative models in adversarial settings are provided.
Abstract: Image segmentation is a key topic in image processing and computer vision with applications such as scene understanding, medical image analysis, robotic perception, video surveillance, augmented reality, and image compression, among many others. Various algorithms for image segmentation have been developed in the literature. Recently, due to the success of deep learning models in a wide range of vision applications, there has been a substantial amount of works aimed at developing image segmentation approaches using deep learning models. In this survey, we provide a comprehensive review of the literature at the time of this writing, covering a broad spectrum of pioneering works for semantic and instance-level segmentation, including fully convolutional pixel-labeling networks, encoder-decoder architectures, multi-scale and pyramid based approaches, recurrent networks, visual attention models, and generative models in adversarial settings. We investigate the similarity, strengths and challenges of these deep learning models, examine the most widely used datasets, report performances, and discuss promising future research directions in this area.

950 citations


Journal ArticleDOI
TL;DR: Albumentations as mentioned in this paper is a fast and flexible open source library for image augmentation with many various image transform operations available that is also an easy-to-use wrapper around other augmentation libraries.
Abstract: Data augmentation is a commonly used technique for increasing both the size and the diversity of labeled training sets by leveraging input transformations that preserve corresponding output labels. In computer vision, image augmentations have become a common implicit regularization technique to combat overfitting in deep learning models and are ubiquitously used to improve performance. While most deep learning frameworks implement basic image transformations, the list is typically limited to some variations of flipping, rotating, scaling, and cropping. Moreover, image processing speed varies in existing image augmentation libraries. We present Albumentations, a fast and flexible open source library for image augmentation with many various image transform operations available that is also an easy-to-use wrapper around other augmentation libraries. We discuss the design principles that drove the implementation of Albumentations and give an overview of the key features and distinct capabilities. Finally, we provide examples of image augmentations for different computer vision tasks and demonstrate that Albumentations is faster than other commonly used image augmentation tools on most image transform operations.

806 citations


Posted Content
TL;DR: To maximally excavate the capability of transformer, the IPT model is presented to utilize the well-known ImageNet benchmark for generating a large amount of corrupted image pairs and the contrastive learning is introduced for well adapting to different image processing tasks.
Abstract: As the computing power of modern hardware is increasing strongly, pre-trained deep learning models (e.g., BERT, GPT-3) learned on large-scale datasets have shown their effectiveness over conventional methods. The big progress is mainly contributed to the representation ability of transformer and its variant architectures. In this paper, we study the low-level computer vision task (e.g., denoising, super-resolution and deraining) and develop a new pre-trained model, namely, image processing transformer (IPT). To maximally excavate the capability of transformer, we present to utilize the well-known ImageNet benchmark for generating a large amount of corrupted image pairs. The IPT model is trained on these images with multi-heads and multi-tails. In addition, the contrastive learning is introduced for well adapting to different image processing tasks. The pre-trained model can therefore efficiently employed on desired task after fine-tuning. With only one pre-trained model, IPT outperforms the current state-of-the-art methods on various low-level benchmarks. Code is available at this https URL and this https URL

631 citations


Journal ArticleDOI
TL;DR: A novel framework based on Transformer, which achieves huge success in natural language processing and displays great potential in image processing, is presented, which is inherently permutation invariant for processing a sequence of points, making it well-suited for point cloud learning.
Abstract: The irregular domain and lack of ordering make it challenging to design deep neural networks for point cloud processing. This paper presents a novel framework named Point Cloud Transformer(PCT) for point cloud learning. PCT is based on Transformer, which achieves huge success in natural language processing and displays great potential in image processing. It is inherently permutation invariant for processing a sequence of points, making it well-suited for point cloud learning. To better capture local context within the point cloud, we enhance input embedding with the support of farthest point sampling and nearest neighbor search. Extensive experiments demonstrate that the PCT achieves the state-of-the-art performance on shape classification, part segmentation and normal estimation tasks.

536 citations


Book
14 Feb 2020
TL;DR: This paper presents a meta-modelling framework for 3D Vision Applications that automates the very labor-intensive and therefore time-heavy and expensive process of 3D image processing.
Abstract: Preface. Acknowledgements. Notation and Abbreviations. Part I. 1 Introduction. 1.1 Stereo-pair Images and Depth Perception. 1.2 3D Vision Systems. 1.3 3D Vision Applications. 1.4 Contents Overview: The 3D Vision Task in Stages. 2 Brief History of Research on Vision. 2.1 Abstract. 2.2 Retrospective of Vision Research. 2.3 Closure. Part II. 3 2D and 3D Vision Formation. 3.1 Abstract. 3.2 Human Visual System. 3.3 Geometry and Acquisition of a Single Image. 3.4 Stereoscopic Acquisition Systems. 3.5 Stereo Matching Constraints. 3.6 Calibration of Cameras. 3.7 Practical Examples. 3.8 Appendix: Derivation of the Pin-hole Camera Transformation. 3.9 Closure. 4 Low-level Image Processing for Image Matching. 4.1 Abstract. 4.2 Basic Concepts. 4.3 Discrete Averaging. 4.4 Discrete Differentiation. 4.5 Edge Detection. 4.6 Structural Tensor. 4.7 Corner Detection. 4.8 Practical Examples. 4.9 Closure. 5 Scale-space Vision. 5.1 Abstract. 5.2 Basic Concepts. 5.3 Constructing a Scale-space. 5.4 Multi-resolution Pyramids. 5.5 Practical Examples. 5.6 Closure. 6 Image Matching Algorithms. 6.1 Abstract. 6.2 Basic Concepts. 6.3 Match Measures. 6.4 Computational Aspects of Matching. 6.5 Diversity of Stereo Matching Methods. 6.6 Area-based Matching. 6.7 Area-based Elastic Matching. 6.8 Feature-based Image Matching. 6.9 Gradient-based Matching. 6.10 Method of Dynamic Programming. 6.11 Graph Cut Approach. 6.12 Optical Flow. 6.13 Practical Examples. 6.14 Closure. 7 Space Reconstruction and Multiview Integration. 7.1 Abstract. 7.2 General 3D Reconstruction. 7.3 Multiview Integration. 7.4 Closure. 8 Case Examples. 8.1 Abstract. 8.2 3D System for Vision-Impaired Persons. 8.3 Face and Body Modelling. 8.4 Clinical and Veterinary Applications. 8.5 Movie Restoration. 8.6 Closure. Part III. 9 Basics of the Projective Geometry. 9.1 Abstract. 9.2 Homogeneous Coordinates. 9.3 Point, Line and the Rule of Duality. 9.4 Point and Line at Infinity. 9.5 Basics on Conics. 9.6 Group of Projective Transformations. 9.7 Projective Invariants. 9.8 Closure. 10 Basics of Tensor Calculus for Image Processing. 10.1 Abstract. 10.2 Basic Concepts. 10.3 Change of a Base. 10.4 Laws of Tensor Transformations. 10.5 The Metric Tensor. 10.6 Simple Tensor Algebra. 10.7 Closure. 11 Distortions and Noise in Images. 11.1 Abstract. 11.2 Types and Models of Noise. 11.3 Generating Noisy Test Images. 11.4 Generating Random Numbers with Normal Distributions. 11.5 Closure. 12 Image Warping Procedures. 12.1 Abstract. 12.2 Architecture of the Warping System. 12.3 Coordinate Transformation Module. 12.4 Interpolation of Pixel Values. 12.5 The Warp Engine. 12.6 Software Model of the Warping Schemes. 12.7 Warp Examples. 12.8 Finding the Linear Transformation from Point Correspondences. 12.9 Closure. 13 Programming Techniques for Image Processing and Computer Vision. 13.1 Abstract. 13.2 Useful Techniques and Methodology. 13.3 Design Patterns. 13.4 Object Lifetime and Memory Management. 13.5 Image Processing Platforms. 13.6 Closure. 14 Image Processing Library. References. Index.

365 citations


Journal ArticleDOI
TL;DR: This study demonstrates how transfer learning from deep learning models can be used to perform COVID-19 detection using images from three most commonly used medical imaging modes X-Ray, Ultrasound, and CT scan to provide over-stressed medical professionals a second pair of eyes through intelligent deep learning image classification models.
Abstract: Detecting COVID-19 early may help in devising an appropriate treatment plan and disease containment decisions. In this study, we demonstrate how transfer learning from deep learning models can be used to perform COVID-19 detection using images from three most commonly used medical imaging modes X-Ray, Ultrasound, and CT scan. The aim is to provide over-stressed medical professionals a second pair of eyes through intelligent deep learning image classification models. We identify a suitable Convolutional Neural Network (CNN) model through initial comparative study of several popular CNN models. We then optimize the selected VGG19 model for the image modalities to show how the models can be used for the highly scarce and challenging COVID-19 datasets. We highlight the challenges (including dataset size and quality) in utilizing current publicly available COVID-19 datasets for developing useful deep learning models and how it adversely impacts the trainability of complex models. We also propose an image pre-processing stage to create a trustworthy image dataset for developing and testing the deep learning models. The new approach is aimed to reduce unwanted noise from the images so that deep learning models can focus on detecting diseases with specific features from them. Our results indicate that Ultrasound images provide superior detection accuracy compared to X-Ray and CT scans. The experimental results highlight that with limited data, most of the deeper networks struggle to train well and provides less consistency over the three imaging modes we are using. The selected VGG19 model, which is then extensively tuned with appropriate parameters, performs in considerable levels of COVID-19 detection against pneumonia or normal for all three lung image modes with the precision of up to 86% for X-Ray, 100% for Ultrasound and 84% for CT scans.

349 citations


Journal ArticleDOI
TL;DR: In this article, a new PDE interpretation of a class of deep convolutional neural networks (CNN) was established, which are commonly used to learn from speech, image, and video data.
Abstract: Partial differential equations (PDEs) are indispensable for modeling many physical phenomena and also commonly used for solving image processing tasks. In the latter area, PDE-based approaches interpret image data as discretizations of multivariate functions and the output of image processing algorithms as solutions to certain PDEs. Posing image processing problems in the infinite-dimensional setting provides powerful tools for their analysis and solution. For the last few decades, the reinterpretation of classical image processing problems through the PDE lens has been creating multiple celebrated approaches that benefit a vast area of tasks including image segmentation, denoising, registration, and reconstruction. In this paper, we establish a new PDE interpretation of a class of deep convolutional neural networks (CNN) that are commonly used to learn from speech, image, and video data. Our interpretation includes convolution residual neural networks (ResNet), which are among the most promising approaches for tasks such as image classification having improved the state-of-the-art performance in prestigious benchmark challenges. Despite their recent successes, deep ResNets still face some critical challenges associated with their design, immense computational costs and memory requirements, and lack of understanding of their reasoning. Guided by well-established PDE theory, we derive three new ResNet architectures that fall into two new classes: parabolic and hyperbolic CNNs. We demonstrate how PDE theory can provide new insights and algorithms for deep learning and demonstrate the competitiveness of three new CNN architectures using numerical experiments.

329 citations


Journal ArticleDOI
TL;DR: Flat optics for direct image differentiation is demonstrated, allowing us to significantly shrink the required optical system size, significantly reducing the size and complexity of conventional optical systems.
Abstract: Image processing has become a critical technology in a variety of science and engineering disciplines. Although most image processing is performed digitally, optical analog processing has the advantages of being low-power and high-speed, but it requires a large volume. Here, we demonstrate flat optics for direct image differentiation, allowing us to significantly shrink the required optical system size. We first demonstrate how the differentiator can be combined with traditional imaging systems such as a commercial optical microscope and camera sensor for edge detection with a numerical aperture up to 0.32. We next demonstrate how the entire processing system can be realized as a monolithic compound flat optic by integrating the differentiator with a metalens. The compound nanophotonic system manifests the advantage of thin form factor as well as the ability to implement complex transfer functions, and could open new opportunities in applications such as biological imaging and computer vision. Vertical integration of a metalens to realize compound nanophotonic systems for optical analog image processing is realized, significantly reducing the size and complexity of conventional optical systems.

256 citations


Journal ArticleDOI
TL;DR: A new data augmentation technique called random image cropping and patching (RICAP) which randomly crops four images and patches them to create a new training image and achieves a new state-of-the-art test error of 2.19% on CIFAR-10.
Abstract: Deep convolutional neural networks (CNNs) have achieved remarkable results in image processing tasks. However, their high expression ability risks overfitting. Consequently, data augmentation techniques have been proposed to prevent overfitting while enriching datasets. Recent CNN architectures with more parameters are rendering traditional data augmentation techniques insufficient. In this study, we propose a new data augmentation technique called random image cropping and patching (RICAP) which randomly crops four images and patches them to create a new training image. Moreover, RICAP mixes the class labels of the four images, resulting in an advantage of the soft labels. We evaluated RICAP with current state-of-the-art CNNs (e.g., the shake-shake regularization model) by comparison with competitive data augmentation techniques such as cutout and mixup. RICAP achieves a new state-of-the-art test error of 2.19% on CIFAR-10. We also confirmed that deep CNNs with RICAP achieve better results on classification tasks using CIFAR-100 and ImageNet, an image-caption retrieval task using Microsoft COCO, and other computer vision tasks.

256 citations


Journal ArticleDOI
TL;DR: This paper presents novel loss functions for training convolutional neural network (CNN)-based segmentation methods with the goal of reducing HD directly, and suggests three loss functions that can be used for training to reduce HD.
Abstract: The Hausdorff Distance (HD) is widely used in evaluating medical image segmentation methods. However, the existing segmentation methods do not attempt to reduce HD directly. In this paper, we present novel loss functions for training convolutional neural network (CNN)-based segmentation methods with the goal of reducing HD directly. We propose three methods to estimate HD from the segmentation probability map produced by a CNN. One method makes use of the distance transform of the segmentation boundary. Another method is based on applying morphological erosion on the difference between the true and estimated segmentation maps. The third method works by applying circular/spherical convolution kernels of different radii on the segmentation probability maps. Based on these three methods for estimating HD, we suggest three loss functions that can be used for training to reduce HD. We use these loss functions to train CNNs for segmentation of the prostate, liver, and pancreas in ultrasound, magnetic resonance, and computed tomography images and compare the results with commonly-used loss functions. Our results show that the proposed loss functions can lead to approximately 18–45% reduction in HD without degrading other segmentation performance criteria such as the Dice similarity coefficient. The proposed loss functions can be used for training medical image segmentation methods in order to reduce the large segmentation errors.

238 citations


Journal ArticleDOI
TL;DR: Zhang et al. as discussed by the authors proposed a multi-level image decomposition method based on latent low-rank representation (LatLRR), which is called MD LatLRR, which is used to decompose source images into detail parts and base parts.
Abstract: Image decomposition is crucial for many image processing tasks, as it allows to extract salient features from source images. A good image decomposition method could lead to a better performance, especially in image fusion tasks. We propose a multi-level image decomposition method based on latent low-rank representation(LatLRR), which is called MDLatLRR. This decomposition method is applicable to many image processing fields. In this paper, we focus on the image fusion task. We build a novel image fusion framework based on MDLatLRR which is used to decompose source images into detail parts(salient features) and base parts. A nuclear-norm based fusion strategy is used to fuse the detail parts and the base parts are fused by an averaging strategy. Compared with other state-of-the-art fusion methods, the proposed algorithm exhibits better fusion performance in both subjective and objective evaluation.

Journal ArticleDOI
TL;DR: An improved deep fully convolutional neural network, named as CrackSegNet, is proposed to conduct dense pixel-wise crack segmentation, making tunnel inspection and monitoring highly efficient, low cost, and eventually automatable.

Journal ArticleDOI
TL;DR: In this review, the basics of deep learning methods are discussed along with an overview of successful implementations involving image segmentation for different medical applications and the future need for further improvements is pointed out.

Journal ArticleDOI
TL;DR: It is shown that the model exceeds the state of the art in saliency detection of magnetic tiles, in which it both effectively and explicitly maps multiple surface defects from low-contrast images.
Abstract: Computer vision builds a connection between image processing and industrials, bringing modern perception to the automated manufacture of magnetic tiles. In this article, we propose a real-time model called MCuePush U-Net, specifically designed for saliency detection of surface defect. Our model consists of three main components: MCue, U-Net and Push network. MCue generates three-channel resized inputs, including one MCue saliency image and two raw images; U-Net learns the most informative regions, and essentially it is a deep hierarchical structured convolutional network; Push network defines the specific location of predicted surface defects with bounding boxes, constructed by two fully connected layers and one output layer. We show that the model exceeds the state of the art in saliency detection of magnetic tiles, in which it both effectively and explicitly maps multiple surface defects from low-contrast images. The proposed model significantly reduces time cost of machinery from 0.5 s per image to 0.07 s and enhances detection accuracy for image-based defect examinations.

Journal ArticleDOI
TL;DR: This article reviews both datasets and visual attention modelling approaches for 360° video/image, which either utilize the spherical characteristics or visual attention models, and overviews the compression approaches.
Abstract: Nowadays, 360° video/image has been increasingly popular and drawn great attention. The spherical viewing range of 360° video/image accounts for huge data, which pose the challenges to 360° video/image processing in solving the bottleneck of storage, transmission, etc. Accordingly, the recent years have witnessed the explosive emergence of works on 360° video/image processing. In this article, we review the state-of-the-art works on 360° video/image processing from the aspects of perception, assessment and compression. First, this article reviews both datasets and visual attention modelling approaches for 360° video/image. Second, we survey the related works on both subjective and objective visual quality assessment (VQA) of 360° video/image. Third, we overview the compression approaches for 360° video/image, which either utilize the spherical characteristics or visual attention models. Finally, we summarize this overview article and outlook the future research trends on 360° video/image processing.

Journal ArticleDOI
TL;DR: The hybrid method is based on using both image processing and deep learning for improved results and the introduced method is efficient and successful enough at diagnosing diabetic retinopathy from retinal fundus images.
Abstract: The objective of this study is to propose an alternative, hybrid solution method for diagnosing diabetic retinopathy from retinal fundus images. In detail, the hybrid method is based on using both image processing and deep learning for improved results. In medical image processing, reliable diabetic retinopathy detection from digital fundus images is known as an open problem and needs alternative solutions to be developed. In this context, manual interpretation of retinal fundus images requires the magnitude of work, expertise, and over-processing time. So, doctors need support from imaging and computer vision systems and the next step is widely associated with use of intelligent diagnosis systems. The solution method proposed in this study includes employment of image processing with histogram equalization, and the contrast limited adaptive histogram equalization techniques. Next, the diagnosis is performed by the classification of a convolutional neural network. The method was validated using 400 retinal fundus images within the MESSIDOR database, and average values for different performance evaluation parameters were obtained as accuracy 97%, sensitivity (recall) 94%, specificity 98%, precision 94%, FScore 94%, and GMean 95%. In addition to those results, a general comparison of with some previously carried out studies has also shown that the introduced method is efficient and successful enough at diagnosing diabetic retinopathy from retinal fundus images. By employing the related image processing techniques and deep learning for diagnosing diabetic retinopathy, the proposed method and the research results are valuable contributions to the associated literature.

Proceedings ArticleDOI
14 Jun 2020
TL;DR: Mao et al. as discussed by the authors employed multiple latent codes to generate multiple feature maps at some intermediate layer of the generator, then compose them with adaptive channel importance to recover the input image.
Abstract: Despite the success of Generative Adversarial Networks (GANs) in image synthesis, applying trained GAN models to real image processing remains challenging. Previous methods typically invert a target image back to the latent space either by back-propagation or by learning an additional encoder. However, the reconstructions from both of the methods are far from ideal. In this work, we propose a novel approach, called mGANprior, to incorporate the well-trained GANs as effective prior to a variety of image processing tasks. In particular, we employ multiple latent codes to generate multiple feature maps at some intermediate layer of the generator, then compose them with adaptive channel importance to recover the input image. Such an over-parameterization of the latent space significantly improves the image reconstruction quality, outperforming existing competitors. The resulting high-fidelity image reconstruction enables the trained GAN models as prior to many real-world applications, such as image colorization, super-resolution, image inpainting, and semantic manipulation. We further analyze the properties of the layer-wise representation learned by GAN models and shed light on what knowledge each layer is capable of representing.

Journal ArticleDOI
TL;DR: This paper introduces the convolutional neural network (CNN) approach along with Data Augmentation and Image Processing to categorize brain MRI scan images into cancerous and non-cancerous and shows that the model requires very less computational power and has much better accuracy results as compared to other pre-trained models.
Abstract: Brain tumor is a severe cancer disease caused by uncontrollable and abnormal partitioning of cells. Recent progress in the field of deep learning has helped the health industry in Medical Imaging for Medical Diagnostic of many diseases. For Visual learning and Image Recognition, task CNN is the most prevalent and commonly used machine learning algorithm. Similarly, in our paper, we introduce the convolutional neural network (CNN) approach along with Data Augmentation and Image Processing to categorize brain MRI scan images into cancerous and non-cancerous. Using the transfer learning approach we compared the performance of our scratched CNN model with pre-trained VGG-16, ResNet-50, and Inception-v3 models. As the experiment is tested on a very small dataset but the experimental result shows that our model accuracy result is very effective and have very low complexity rate by achieving 100% accuracy, while VGG-16 achieved 96%, ResNet-50 achieved 89% and Inception-V3 achieved 75% accuracy. Our model requires very less computational power and has much better accuracy results as compared to other pre-trained models.

Journal ArticleDOI
TL;DR: Improved whale optimization algorithm is utilized for optimizing the CNN and results show that the proposed method has superiority toward the other compared methods for the early detection of skin cancer.

Posted ContentDOI
05 Sep 2020
TL;DR: This paper aims to propose a high- speed and accurate fully-automated method to detect COVID-19 from the patient9s CT scan, and proposes a new modified deep convolutional network that is based on ResNet50V2 and enhanced by the feature pyramid network for classifying the selected CT images into CO VID-19 or normal.
Abstract: This paper aims to propose a high-speed and accurate fully-automated method to detect COVID-19 from the patient's chest CT scan images. We introduce a new dataset that contains 48,260 CT scan images from 282 normal persons and 15,589 images from 95 patients with COVID-19 infections. At the first stage, this system runs our proposed image processing algorithm that analyzes the view of the lung to discard those CT images that inside the lung is not properly visible in them. This action helps to reduce the processing time and false detections. At the next stage, we introduce a novel architecture for improving the classification accuracy of convolutional networks on images containing small important objects. Our architecture applies a new feature pyramid network designed for classification problems to the ResNet50V2 model so the model becomes able to investigate different resolutions of the image and do not lose the data of small objects. As the infections of COVID-19 exist in various scales, especially many of them are tiny, using our method helps to increase the classification performance remarkably. After running these two phases, the system determines the condition of the patient using a selected threshold. We are the first to evaluate our system in two different ways on Xception, ResNet50V2, and our model. In the single image classification stage, our model achieved 98.49% accuracy on more than 7996 test images. At the patient condition identification phase, the system correctly identified almost 234 of 245 patients with high speed. Our dataset is accessible at https://github.com/mr7495/COVID-CTset .

Journal ArticleDOI
TL;DR: The obtained experimental results show that the proposed fractional masks are computationally efficient, and their performances are compatible with other standard and fractional smoothing filters.
Abstract: Based on a new definition for derivative and integral of fractional-order, several fractional masks have been presented for the use of image denoising. In each method, the process involves constructing a square and then applying it to all the corresponding blocks in the noisy image. We have measured the denoising performance of our proposed masks by employing some known indexes. They are the peak signal-to-noise ratio (PSNR), ENTROPY, and SSIM. The obtained experimental results show that our proposed masks are computationally efficient, and their performances are compatible with other standard and fractional smoothing filters.

Journal ArticleDOI
TL;DR: The three types of 3D data representations are compared and the corresponding performance of the deep neural networks for 3D object detection is studied and compared in three ways, classification based, object detection based and segmentation based.
Abstract: Road pavement cracks detection has been a hot research topic for quite a long time due to the practical importance of crack detection for road maintenance and traffic safety. Many methods have been proposed to solve this problem. This paper reviews the three major types of methods used in road cracks detection: image processing, machine learning and 3D imaging based methods. Image processing algorithms mainly include threshold segmentation, edge detection and region growing methods, which are used to process images and identify crack features. Crack detection based traditional machine learning methods such as neural network and support vector machine still relies on hand-crafted features using image processing techniques. Deep learning methods have fundamentally changed the way of crack detection and greatly improved the detection performance. In this work, we review and compare the deep learning neural networks proposed in crack detection in three ways, classification based, object detection based and segmentation based. We also cover the performance evaluation metrics and the performance of these methods on commonly-used benchmark datasets. With the maturity of 3D technology, crack detection using 3D data is a new line of research and application. We compare the three types of 3D data representations and study the corresponding performance of the deep neural networks for 3D object detection. Traditional and deep learning based crack detection methods using 3D data are also reviewed in detail.

Proceedings ArticleDOI
01 Mar 2020
TL;DR: Kornia as mentioned in this paper is an open source computer vision library which consists of a set of differentiable routines and modules to solve generic computer vision problems, such as image transformations, camera calibration, epipolar geometry, and low level image processing techniques.
Abstract: This work presents Kornia – an open source computer vision library which consists of a set of differentiable routines and modules to solve generic computer vision problems. The package uses PyTorch as its main backend both for efficiency and to take advantage of the reverse-mode auto-differentiation to define and compute the gradient of complex functions. Inspired by OpenCV, Kornia is composed of a set of modules containing operators that can be inserted inside neural networks to train models to perform image transformations, camera calibration, epipolar geometry, and low level image processing techniques, such as filtering and edge detection that operate directly on high dimensional tensor representations. Examples of classical vision problems implemented using our framework are provided including a benchmark comparing to existing vision libraries.

Journal ArticleDOI
TL;DR: Comparisons with other state-of-the-art methods demonstrate that the proposed underwater image enhancement method can achieve higher accuracy of estimated BLs, lower computation time, overall superior performance, and better information retention.
Abstract: Underwater images often have severe quality degradation and distortion due to light absorption and scattering in the water medium. A hazy image formation model is widely used to restore the image quality. It depends on two optical parameters: the background light (BL) and the transmission map (TM). Underwater images can also be enhanced by color and contrast correction from the perspective of image processing. In this paper, we propose an effective underwater image enhancement method for underwater images in composition of underwater image restoration and color correction. Firstly, a manually annotated background lights (MABLs) database is developed. With reference to the relationship between MABLs and the histogram distributions of various underwater images, robust statistical models of BLs estimation are provided. Next, the TM of R channel is roughly estimated based on the new underwater dark channel prior (NUDCP) via the statistic of clear and high resolution (HD) underwater images, then a scene depth map based on the underwater light attenuation prior (ULAP) and an adjusted reversed saturation map (ARSM) are applied to compensate and modify the coarse TM of R channel. Next, TMs of G-B channels are estimated based on the difference of attenuation ratios between R and G-B channels. Finally, to improve the color and contrast of the restored image with a dehazed and natural appearance, a variation of white balance is introduced as post-processing. In order to guide the priority of underwater image enhancement, sufficient evaluations are conducted to discuss the impacts of the key parameters including BL and TM, and the importance of the color correction. Comparisons with other state-of-the-art methods demonstrate that our proposed underwater image enhancement method can achieve higher accuracy of estimated BLs, lower computation time, overall superior performance, and better information retention.

Journal ArticleDOI
TL;DR: This paper developed a brain tumor classification using a hybrid deep autoencoder with a Bayesian fuzzy clustering-based segmentation approach that obtained the high classification accuracy when compared to other state-of-art methods.

Journal ArticleDOI
TL;DR: Pixel-DL as discussed by the authors employs pixel-wise interpolation governed by the physics of photoacoustic wave propagation and then uses a convolution neural network to reconstruct an image, achieving comparable or better performance to iterative methods and consistently outperformed other CNN-based approaches.
Abstract: Photoacoustic tomography (PAT) is a non-ionizing imaging modality capable of acquiring high contrast and resolution images of optical absorption at depths greater than traditional optical imaging techniques. Practical considerations with instrumentation and geometry limit the number of available acoustic sensors and their “view” of the imaging target, which result in image reconstruction artifacts degrading image quality. Iterative reconstruction methods can be used to reduce artifacts but are computationally expensive. In this work, we propose a novel deep learning approach termed pixel-wise deep learning (Pixel-DL) that first employs pixel-wise interpolation governed by the physics of photoacoustic wave propagation and then uses a convolution neural network to reconstruct an image. Simulated photoacoustic data from synthetic, mouse-brain, lung, and fundus vasculature phantoms were used for training and testing. Results demonstrated that Pixel-DL achieved comparable or better performance to iterative methods and consistently outperformed other CNN-based approaches for correcting artifacts. Pixel-DL is a computationally efficient approach that enables for real-time PAT rendering and improved image reconstruction quality for limited-view and sparse PAT.

Journal ArticleDOI
TL;DR: An efficient methodology for multilevel segmentation is proposed using the Harris Hawks Optimization (HHO) algorithm and the minimum cross-entropy as a fitness function and it presents an improvement over other segmentation approaches that are currently used in the literature.
Abstract: Segmentation is a crucial phase in image processing because it simplifies the representation of an image and facilitates its analysis. The multilevel thresholding method is more efficient for segmenting digital mammograms compared to the classic bi-level thresholding since it uses a higher number of intensities to represent different regions in the image. In the literature, there are different techniques for multilevel segmentation; however, most of these approaches do not obtain good segmented images. In addition, they are computationally expensive. Recently, statistical criteria such as Otsu, Kapur, and cross-entropy have been utilized in combination with evolutionary and swarm-based strategies to investigate the optimal threshold values for multilevel segmentation. In this paper, an efficient methodology for multilevel segmentation is proposed using the Harris Hawks Optimization (HHO) algorithm and the minimum cross-entropy as a fitness function. To substantiate the results and effectiveness of the HHO-based method, it has been tested over a benchmark set of reference images, with the Berkeley segmentation database, and with medical images of digital mammography. The proposed HHO-based solver is verified based on other comparable optimizers and two machine learning algorithms K-means and the Fuzzy IterAg. The comparisons were performed based on three groups. This first one is to provide evidence of the optimization capabilities of the HHO using the Wilcoxon test, and the second is to verify segmented image quality using the PSNR, SSIM, and FSIM metrics. Then, the third way is to verify the segmented image comparing it with the ground-truth through the metrics PRI, GCE, and VoI. The experimental results, which are validated by statistical analysis, show that the introduced method produces efficient and reliable results in terms of quality, consistency, and accuracy in comparison with the other methods. This HHO-based method presents an improvement over other segmentation approaches that are currently used in the literature.

Journal ArticleDOI
TL;DR: This work has shown that it has reached the perfect classification rate by using X-ray image for Covid-19 detection, and SVM classifier achieved 100.0% classification accuracy by using 10-fold cross-validation.

Journal ArticleDOI
TL;DR: A new approach to clutter removal based on robust principle component analysis (PCA) and deep learning is proposed and it is illustrated that an iterative algorithm based on this model exhibits improved separation of microbubble signal from the tissue signal over commonly practiced methods.
Abstract: Contrast enhanced ultrasound is a radiation-free imaging modality which uses encapsulated gas microbubbles for improved visualization of the vascular bed deep within the tissue. It has recently been used to enable imaging with unprecedented subwavelength spatial resolution by relying on super-resolution techniques. A typical preprocessing step in super-resolution ultrasound is to separate the microbubble signal from the cluttering tissue signal. This step has a crucial impact on the final image quality. Here, we propose a new approach to clutter removal based on robust principle component analysis (PCA) and deep learning. We begin by modeling the acquired contrast enhanced ultrasound signal as a combination of low rank and sparse components. This model is used in robust PCA and was previously suggested in the context of ultrasound Doppler processing and dynamic magnetic resonance imaging. We then illustrate that an iterative algorithm based on this model exhibits improved separation of microbubble signal from the tissue signal over commonly practiced methods. Next, we apply the concept of deep unfolding to suggest a deep network architecture tailored to our clutter filtering problem which exhibits improved convergence speed and accuracy with respect to its iterative counterpart. We compare the performance of the suggested deep network on both simulations and in-vivo rat brain scans, with a commonly practiced deep-network architecture and with the fast iterative shrinkage algorithm. We show that our architecture exhibits better image quality and contrast.

Journal ArticleDOI
TL;DR: A deep learning-based image analysis pipeline that performs segmentation, tracking, and lineage reconstruction on time-lapse movies of Escherichia coli cells trapped in a "mother machine" microfluidic device, a scalable platform for long-term single-cell analysis that is widely used in the field.
Abstract: Microscopy image analysis is a major bottleneck in quantification of single-cell microscopy data, typically requiring human oversight and curation, which limit both accuracy and throughput. To address this, we developed a deep learning-based image analysis pipeline that performs segmentation, tracking, and lineage reconstruction. Our analysis focuses on time-lapse movies of Escherichia coli cells trapped in a "mother machine" microfluidic device, a scalable platform for long-term single-cell analysis that is widely used in the field. While deep learning has been applied to cell segmentation problems before, our approach is fundamentally innovative in that it also uses machine learning to perform cell tracking and lineage reconstruction. With this framework we are able to get high fidelity results (1% error rate), without human intervention. Further, the algorithm is fast, with complete analysis of a typical frame containing ~150 cells taking <700msec. The framework is not constrained to a particular experimental set up and has the potential to generalize to time-lapse images of other organisms or different experimental configurations. These advances open the door to a myriad of applications including real-time tracking of gene expression and high throughput analysis of strain libraries at single-cell resolution.