scispace - formally typeset
Search or ask a question

Showing papers on "Image processing published in 2018"


Journal ArticleDOI
TL;DR: The HAM10000 dataset as mentioned in this paper contains 10015 dermatoscopic images from different populations acquired and stored by different modalities and applied different acquisition and cleaning methods and developed semi-automatic workflows utilizing specifically trained neural networks.
Abstract: Training of neural networks for automated diagnosis of pigmented skin lesions is hampered by the small size and lack of diversity of available datasets of dermatoscopic images. We tackle this problem by releasing the HAM10000 ("Human Against Machine with 10000 training images") dataset. We collected dermatoscopic images from different populations acquired and stored by different modalities. Given this diversity we had to apply different acquisition and cleaning methods and developed semi-automatic workflows utilizing specifically trained neural networks. The final dataset consists of 10015 dermatoscopic images which are released as a training set for academic machine learning purposes and are publicly available through the ISIC archive. This benchmark dataset can be used for machine learning and for comparisons with human experts. Cases include a representative collection of all important diagnostic categories in the realm of pigmented lesions. More than 50% of lesions have been confirmed by pathology, while the ground truth for the rest of the cases was either follow-up, expert consensus, or confirmation by in-vivo confocal microscopy.

1,528 citations


Journal ArticleDOI
21 Mar 2018-Nature
TL;DR: A unified framework for image reconstruction—automated transform by manifold approximation (AUTOMAP)—which recasts image reconstruction as a data-driven supervised learning task that allows a mapping between the sensor and the image domain to emerge from an appropriate corpus of training data is presented.
Abstract: Image reconstruction is essential for imaging applications across the physical and life sciences, including optical and radar systems, magnetic resonance imaging, X-ray computed tomography, positron emission tomography, ultrasound imaging and radio astronomy. During image acquisition, the sensor encodes an intermediate representation of an object in the sensor domain, which is subsequently reconstructed into an image by an inversion of the encoding function. Image reconstruction is challenging because analytic knowledge of the exact inverse transform may not exist a priori, especially in the presence of sensor non-idealities and noise. Thus, the standard reconstruction approach involves approximating the inverse function with multiple ad hoc stages in a signal processing chain, the composition of which depends on the details of each acquisition strategy, and often requires expert parameter tuning to optimize reconstruction performance. Here we present a unified framework for image reconstruction-automated transform by manifold approximation (AUTOMAP)-which recasts image reconstruction as a data-driven supervised learning task that allows a mapping between the sensor and the image domain to emerge from an appropriate corpus of training data. We implement AUTOMAP with a deep neural network and exhibit its flexibility in learning reconstruction transforms for various magnetic resonance imaging acquisition strategies, using the same network architecture and hyperparameters. We further demonstrate that manifold learning during training results in sparse representations of domain transforms along low-dimensional data manifolds, and observe superior immunity to noise and a reduction in reconstruction artefacts compared with conventional handcrafted reconstruction methods. In addition to improving the reconstruction performance of existing acquisition methodologies, we anticipate that AUTOMAP and other learned reconstruction approaches will accelerate the development of new acquisition strategies across imaging modalities.

1,361 citations


Journal ArticleDOI
25 Apr 2018
TL;DR: An overview of core ideas in GSP and their connection to conventional digital signal processing are provided, along with a brief historical perspective to highlight how concepts recently developed build on top of prior research in other areas.
Abstract: Research in graph signal processing (GSP) aims to develop tools for processing data defined on irregular graph domains. In this paper, we first provide an overview of core ideas in GSP and their connection to conventional digital signal processing, along with a brief historical perspective to highlight how concepts recently developed in GSP build on top of prior research in other areas. We then summarize recent advances in developing basic GSP tools, including methods for sampling, filtering, or graph learning. Next, we review progress in several application areas using GSP, including processing and analysis of sensor network data, biological data, and applications to image processing and machine learning.

1,306 citations


Posted Content
TL;DR: This paper describes a simple procedure called AutoAugment to automatically search for improved data augmentation policies, which achieves state-of-the-art accuracy on CIFAR-10, CIFar-100, SVHN, and ImageNet (without additional data).
Abstract: Data augmentation is an effective technique for improving the accuracy of modern image classifiers. However, current data augmentation implementations are manually designed. In this paper, we describe a simple procedure called AutoAugment to automatically search for improved data augmentation policies. In our implementation, we have designed a search space where a policy consists of many sub-policies, one of which is randomly chosen for each image in each mini-batch. A sub-policy consists of two operations, each operation being an image processing function such as translation, rotation, or shearing, and the probabilities and magnitudes with which the functions are applied. We use a search algorithm to find the best policy such that the neural network yields the highest validation accuracy on a target dataset. Our method achieves state-of-the-art accuracy on CIFAR-10, CIFAR-100, SVHN, and ImageNet (without additional data). On ImageNet, we attain a Top-1 accuracy of 83.5% which is 0.4% better than the previous record of 83.1%. On CIFAR-10, we achieve an error rate of 1.5%, which is 0.6% better than the previous state-of-the-art. Augmentation policies we find are transferable between datasets. The policy learned on ImageNet transfers well to achieve significant improvements on other datasets, such as Oxford Flowers, Caltech-101, Oxford-IIT Pets, FGVC Aircraft, and Stanford Cars.

1,278 citations


Journal ArticleDOI
TL;DR: A framework for reconstructing dynamic sequences of 2-D cardiac magnetic resonance images from undersampled data using a deep cascade of convolutional neural networks (CNNs) to accelerate the data acquisition process is proposed and it is demonstrated that CNNs can learn spatio-temporal correlations efficiently by combining convolution and data sharing approaches.
Abstract: Inspired by recent advances in deep learning, we propose a framework for reconstructing dynamic sequences of 2-D cardiac magnetic resonance (MR) images from undersampled data using a deep cascade of convolutional neural networks (CNNs) to accelerate the data acquisition process. In particular, we address the case where data are acquired using aggressive Cartesian undersampling. First, we show that when each 2-D image frame is reconstructed independently, the proposed method outperforms state-of-the-art 2-D compressed sensing approaches, such as dictionary learning-based MR image reconstruction, in terms of reconstruction error and reconstruction speed. Second, when reconstructing the frames of the sequences jointly, we demonstrate that CNNs can learn spatio-temporal correlations efficiently by combining convolution and data sharing approaches. We show that the proposed method consistently outperforms state-of-the-art methods and is capable of preserving anatomical structure more faithfully up to 11-fold undersampling. Moreover, reconstruction is very fast: each complete dynamic sequence can be reconstructed in less than 10 s and, for the 2-D case, each image frame can be reconstructed in 23 ms, enabling real-time applications.

1,062 citations


Journal ArticleDOI
TL;DR: In this article, the authors provide a short overview of recent advances and some associated challenges in machine learning applied to medical image processing and image analysis, and provide a starting point for people interested in experimenting and perhaps contributing to the field of machine learning for medical imaging.
Abstract: What has happened in machine learning lately, and what does it mean for the future of medical image analysis? Machine learning has witnessed a tremendous amount of attention over the last few years. The current boom started around 2009 when so-called deep artificial neural networks began outperforming other established models on a number of important benchmarks. Deep neural networks are now the state-of-the-art machine learning models across a variety of areas, from image analysis to natural language processing, and widely deployed in academia and industry. These developments have a huge potential for medical imaging technology, medical data analysis, medical diagnostics and healthcare in general, slowly being realized. We provide a short overview of recent advances and some associated challenges in machine learning applied to medical image processing and image analysis. As this has become a very broad and fast expanding field we will not survey the entire landscape of applications, but put particular focus on deep learning in MRI. Our aim is threefold: (i) give a brief introduction to deep learning with pointers to core references; (ii) indicate how deep learning has been applied to the entire MRI processing chain, from acquisition to image retrieval, from segmentation to disease prediction; (iii) provide a starting point for people interested in experimenting and perhaps contributing to the field of machine learning for medical imaging by pointing out good educational resources, state-of-the-art open-source code, and interesting sources of data and problems related medical imaging.

991 citations


Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper introduced a new CT image denoising method based on the generative adversarial network (GAN) with Wasserstein distance and perceptual similarity, which is capable of not only reducing the image noise level but also trying to keep the critical information at the same time.
Abstract: The continuous development and extensive use of computed tomography (CT) in medical practice has raised a public concern over the associated radiation dose to the patient. Reducing the radiation dose may lead to increased noise and artifacts, which can adversely affect the radiologists’ judgment and confidence. Hence, advanced image reconstruction from low-dose CT data is needed to improve the diagnostic performance, which is a challenging problem due to its ill-posed nature. Over the past years, various low-dose CT methods have produced impressive results. However, most of the algorithms developed for this application, including the recently popularized deep learning techniques, aim for minimizing the mean-squared error (MSE) between a denoised CT image and the ground truth under generic penalties. Although the peak signal-to-noise ratio is improved, MSE- or weighted-MSE-based methods can compromise the visibility of important structural details after aggressive denoising. This paper introduces a new CT image denoising method based on the generative adversarial network (GAN) with Wasserstein distance and perceptual similarity. The Wasserstein distance is a key concept of the optimal transport theory and promises to improve the performance of GAN. The perceptual loss suppresses noise by comparing the perceptual features of a denoised output against those of the ground truth in an established feature space, while the GAN focuses more on migrating the data noise distribution from strong to weak statistically. Therefore, our proposed method transfers our knowledge of visual perception to the image denoising task and is capable of not only reducing the image noise level but also trying to keep the critical information at the same time. Promising results have been obtained in our experiments with clinical CT images.

916 citations


Journal ArticleDOI
01 Jan 2018
TL;DR: It is shown that reconfigurable memristor crossbars composed of hafnium oxide memristors on top of metal-oxide-semiconductor transistors are capable of analogue vector-matrix multiplication with array sizes of up to 128 × 64 cells.
Abstract: Memristor crossbars offer reconfigurable non-volatile resistance states and could remove the speed and energy efficiency bottleneck in vector-matrix multiplication, a core computing task in signal and image processing. Using such systems to multiply an analogue-voltage-amplitude-vector by an analogue-conductance-matrix at a reasonably large scale has, however, proved challenging due to difficulties in device engineering and array integration. Here we show that reconfigurable memristor crossbars composed of hafnium oxide memristors on top of metal-oxide-semiconductor transistors are capable of analogue vector-matrix multiplication with array sizes of up to 128 × 64 cells. Our output precision (5–8 bits, depending on the array size) is the result of high device yield (99.8%) and the multilevel, stable states of the memristors, while the linear device current–voltage characteristics and low wire resistance between cells leads to high accuracy. With the large memristor crossbars, we demonstrate signal processing, image compression and convolutional filtering, which are expected to be important applications in the development of the Internet of Things (IoT) and edge computing.

817 citations


Journal ArticleDOI
07 Mar 2018-eLife
TL;DR: New open-source software called cisTEM (computational imaging system for transmission electron microscopy) for the processing of data for high-resolution electron cryo-microscopy and single-particle averaging is developed, optimized to enable processing of typical datasets on a high-end, CPU-based workstation in half a day or less, comparable to GPU-accelerated processing.
Abstract: We have developed new open-source software called cisTEM (computational imaging system for transmission electron microscopy) for the processing of data for high-resolution electron cryo-microscopy and single-particle averaging. cisTEM features a graphical user interface that is used to submit jobs, monitor their progress, and display results. It implements a full processing pipeline including movie processing, image defocus determination, automatic particle picking, 2D classification, ab-initio 3D map generation from random parameters, 3D classification, and high-resolution refinement and reconstruction. Some of these steps implement newly-developed algorithms; others were adapted from previously published algorithms. The software is optimized to enable processing of typical datasets (2000 micrographs, 200 k - 300 k particles) on a high-end, CPU-based workstation in half a day or less, comparable to GPU-accelerated processing. Jobs can also be scheduled on large computer clusters using flexible run profiles that can be adapted for most computing environments. cisTEM is available for download from cistem.org.

746 citations


Journal ArticleDOI
TL;DR: This work shows how content-aware image restoration based on deep learning extends the range of biological phenomena observable by microscopy by bypassing the trade-offs between imaging speed, resolution, and maximal light exposure that limit fluorescence imaging to enable discovery.
Abstract: Fluorescence microscopy is a key driver of discoveries in the life sciences, with observable phenomena being limited by the optics of the microscope, the chemistry of the fluorophores, and the maximum photon exposure tolerated by the sample. These limits necessitate trade-offs between imaging speed, spatial resolution, light exposure, and imaging depth. In this work we show how content-aware image restoration based on deep learning extends the range of biological phenomena observable by microscopy. We demonstrate on eight concrete examples how microscopy images can be restored even if 60-fold fewer photons are used during acquisition, how near isotropic resolution can be achieved with up to tenfold under-sampling along the axial direction, and how tubular and granular structures smaller than the diffraction limit can be resolved at 20-times-higher frame rates compared to state-of-the-art methods. All developed image restoration methods are freely available as open source software in Python, FIJI, and KNIME. Content-aware image restoration (CARE) uses deep learning to improve microscopy images. CARE bypasses the trade-offs between imaging speed, resolution, and maximal light exposure that limit fluorescence imaging to enable discovery.

694 citations


Proceedings ArticleDOI
18 Jun 2018
TL;DR: In this paper, a dataset of short-exposure low-light images and reference images is introduced to support the development of learning-based pipelines for low-luminance image processing.
Abstract: Imaging in low light is challenging due to low photon count and low SNR. Short-exposure images suffer from noise, while long exposure can induce blur and is often impractical. A variety of denoising, deblurring, and enhancement techniques have been proposed, but their effectiveness is limited in extreme conditions, such as video-rate imaging at night. To support the development of learning-based pipelines for low-light image processing, we introduce a dataset of raw short-exposure low-light images, with corresponding long-exposure reference images. Using the presented dataset, we develop a pipeline for processing low-light images, based on end-to-end training of a fully-convolutional network. The network operates directly on raw sensor data and replaces much of the traditional image processing pipeline, which tends to perform poorly on such data. We report promising results on the new dataset, analyze factors that affect performance, and highlight opportunities for future work.

Journal ArticleDOI
TL;DR: A user-friendly, multi-platform freeware which enables the calculation of conventional, histogram-based, textural, and shape features from PET, SPECT, MR, CT, and US images, or from any combination of imaging modalities called LIFEx is presented.
Abstract: Textural and shape analysis is gaining considerable interest in medical imaging, particularly to identify parameters characterizing tumor heterogeneity and to feed radiomic models. Here, we present a free, multiplatform, and easy-to-use freeware called LIFEx, which enables the calculation of conventional, histogram-based, textural, and shape features from PET, SPECT, MR, CT, and US images, or from any combination of imaging modalities. The application does not require any programming skills and was developed for medical imaging professionals. The goal is that independent and multicenter evidence of the usefulness and limitations of radiomic features for characterization of tumor heterogeneity and subsequent patient management can be gathered. Many options are offered for interactive textural index calculation and for increasing the reproducibility among centers. The software already benefits from a large user community (more than 800 registered users), and interactions within that community are part of the development strategy.Significance: This study presents a user-friendly, multi-platform freeware to extract radiomic features from PET, SPECT, MR, CT, and US images, or any combination of imaging modalities. Cancer Res; 78(16); 4786-9. ©2018 AACR.

Journal ArticleDOI
TL;DR: This work introduces an effective technique to enhance the images captured underwater and degraded due to the medium scattering and absorption by building on the blending of two images that are directly derived from a color-compensated and white-balanced version of the original degraded image.
Abstract: We introduce an effective technique to enhance the images captured underwater and degraded due to the medium scattering and absorption. Our method is a single image approach that does not require specialized hardware or knowledge about the underwater conditions or scene structure. It builds on the blending of two images that are directly derived from a color-compensated and white-balanced version of the original degraded image. The two images to fusion, as well as their associated weight maps, are defined to promote the transfer of edges and color contrast to the output image. To avoid that the sharp weight map transitions create artifacts in the low frequency components of the reconstructed image, we also adapt a multiscale fusion strategy. Our extensive qualitative and quantitative evaluation reveals that our enhanced images and videos are characterized by better exposedness of the dark regions, improved global contrast, and edges sharpness. Our validation also proves that our algorithm is reasonably independent of the camera settings, and improves the accuracy of several image processing applications, such as image segmentation and keypoint matching.

Posted Content
TL;DR: A pipeline for processing low-light images is developed, based on end-to-end training of a fully-convolutional network that operates directly on raw sensor data and replaces much of the traditional image processing pipeline, which tends to perform poorly on such data.
Abstract: Imaging in low light is challenging due to low photon count and low SNR. Short-exposure images suffer from noise, while long exposure can induce blur and is often impractical. A variety of denoising, deblurring, and enhancement techniques have been proposed, but their effectiveness is limited in extreme conditions, such as video-rate imaging at night. To support the development of learning-based pipelines for low-light image processing, we introduce a dataset of raw short-exposure low-light images, with corresponding long-exposure reference images. Using the presented dataset, we develop a pipeline for processing low-light images, based on end-to-end training of a fully-convolutional network. The network operates directly on raw sensor data and replaces much of the traditional image processing pipeline, which tends to perform poorly on such data. We report promising results on the new dataset, analyze factors that affect performance, and highlight opportunities for future work. The results are shown in the supplementary video at this https URL

Proceedings ArticleDOI
18 Jun 2018
TL;DR: It is found that ensembles perform better and lead to more calibrated predictive uncertainties, which are the basis for many active learning algorithms, and Monte-Carlo Dropout uncertainties perform worse.
Abstract: Deep learning methods have become the de-facto standard for challenging image processing tasks such as image classification. One major hurdle of deep learning approaches is that large sets of labeled data are necessary, which can be prohibitively costly to obtain, particularly in medical image diagnosis applications. Active learning techniques can alleviate this labeling effort. In this paper we investigate some recently proposed methods for active learning with high-dimensional data and convolutional neural network classifiers. We compare ensemble-based methods against Monte-Carlo Dropout and geometric approaches. We find that ensembles perform better and lead to more calibrated predictive uncertainties, which are the basis for many active learning algorithms. To investigate why Monte-Carlo Dropout uncertainties perform worse, we explore potential differences in isolation in a series of experiments. We show results for MNIST and CIFAR-10, on which we achieve a test set accuracy of 90% with roughly 12,200 labeled images, and initial results on ImageNet. Additionally, we show results on a large, highly class-imbalanced diabetic retinopathy dataset. We observe that the ensemble-based active learning effectively counteracts this imbalance during acquisition.

PatentDOI
TL;DR: The experimental implementation of sparse coding algorithms in a bio-inspired approach using a 32 × 32 crossbar array of analog memristors enables efficient implementation of pattern matching and lateral neuron inhibition and allows input data to be sparsely encoded using neuron activities and stored dictionary elements.
Abstract: Sparse representation of information performs powerful feature extraction on high-dimensional data and is of interest for applications in signal processing, machine vision, object recognition, and neurobiology Sparse coding is a mechanism by which biological neural systems can efficiently process complex sensory data while consuming very little power Sparse coding algorithms in a bio-inspired approach can be implemented in a crossbar array of memristors (resistive memory devices) This network enables efficient implementation of pattern matching and lateral neuron inhibition, allowing input data to be sparsely encoded using neuron activities and stored dictionary elements The reconstructed input can be obtained by performing a backward pass through the same crossbar matrix using the neuron activity vector as input Different dictionary sets can be trained and stored in the same system, depending on the nature of the input signals Using the sparse coding algorithm, natural image processing is performed based on a learned dictionary

Book ChapterDOI
03 Oct 2018
TL;DR: The principles of morphological segmentation will be presented and illustrated by means of examples, starting from the simplest ones and introducing step by step more complex segmentation tools.
Abstract: This chapter presents the principles of morphological segmentation Segmentation is one of the key problems in image processing In fact, one should say segmentations because there exist as many techniques as there are specific situations An original method of segmentation based on the use of watershed lines has been developed in the framework of mathematical morphology The chapter describes some useful morphological tools for segmentation: gradient, top-hat transform, distance function, geodesic distance function, and geodesic reconstructions The gradient image is used in the watershed transformation, because the main criterion for the segmentation in many applications is the homogeneity of the gray values of the objects present in the image The problems encountered in the segmentation process will be best illustrated by presenting a complete and typical segmentation problem in the field of automated cytology The oversegmentation produced by direct construction of the watershed line is due to the fact that every regional minimum becomes the center of a catchment basin

Proceedings ArticleDOI
18 Jun 2018
TL;DR: PointFusion as mentioned in this paper is a generic 3D object detection method that leverages both image and 3D point cloud information, which predicts multiple 3D box hypotheses and their confidences using the input 3D points as spatial anchors.
Abstract: We present PointFusion, a generic 3D object detection method that leverages both image and 3D point cloud information. Unlike existing methods that either use multistage pipelines or hold sensor and dataset-specific assumptions, PointFusion is conceptually simple and application-agnostic. The image data and the raw point cloud data are independently processed by a CNN and a PointNet architecture, respectively. The resulting outputs are then combined by a novel fusion network, which predicts multiple 3D box hypotheses and their confidences, using the input 3D points as spatial anchors. We evaluate PointFusion on two distinctive datasets: the KITTI dataset that features driving scenes captured with a lidar-camera setup, and the SUN-RGBD dataset that captures indoor environments with RGB-D cameras. Our model is the first one that is able to perform better or on-par with the state-of-the-art on these diverse datasets without any dataset-specific model tuning.

Proceedings Article
01 Nov 2018
TL;DR: A new deep learning based method that can effectively distinguish AI-generated fake videos from real videos is described, which saves a plenty of time and resources in training data collection and is more robust compared to others.
Abstract: In this work, we describe a new deep learning based method that can effectively distinguish AI-generated fake videos (referred to as {\em DeepFake} videos hereafter) from real videos. Our method is based on the observations that current DeepFake algorithm can only generate images of limited resolutions, which need to be further warped to match the original faces in the source video. Such transforms leave distinctive artifacts in the resulting DeepFake videos, and we show that they can be effectively captured by convolutional neural networks (CNNs). Compared to previous methods which use a large amount of real and DeepFake generated images to train CNN classifier, our method does not need DeepFake generated images as negative training examples since we target the artifacts in affine face warping as the distinctive feature to distinguish real and fake images. The advantages of our method are two-fold: (1) Such artifacts can be simulated directly using simple image processing operations on a image to make it as negative example. Since training a DeepFake model to generate negative examples is time-consuming and resource-demanding, our method saves a plenty of time and resources in training data collection; (2) Since such artifacts are general existed in DeepFake videos from different sources, our method is more robust compared to others. Our method is evaluated on two sets of DeepFake video datasets for its effectiveness in practice.

Journal ArticleDOI
TL;DR: RefineGAN as mentioned in this paper is a variant of fully-residual convolutional autoencoder and generative adversarial networks (GANs) specifically designed for CS-MRI formulation; it employs deeper generator and discriminator networks with cyclic data consistency loss for faithful interpolation in the given under-sampled data.
Abstract: Compressed sensing magnetic resonance imaging (CS-MRI) has provided theoretical foundations upon which the time-consuming MRI acquisition process can be accelerated. However, it primarily relies on iterative numerical solvers, which still hinders their adaptation in time-critical applications. In addition, recent advances in deep neural networks have shown their potential in computer vision and image processing, but their adaptation to MRI reconstruction is still in an early stage. In this paper, we propose a novel deep learning-based generative adversarial model, RefineGAN , for fast and accurate CS-MRI reconstruction. The proposed model is a variant of fully-residual convolutional autoencoder and generative adversarial networks (GANs), specifically designed for CS-MRI formulation; it employs deeper generator and discriminator networks with cyclic data consistency loss for faithful interpolation in the given under-sampled $k$ -space data. In addition, our solution leverages a chained network to further enhance the reconstruction quality. RefineGAN is fast and accurate—the reconstruction process is extremely rapid, as low as tens of milliseconds for reconstruction of a $256\times 256$ image, because it is one-way deployment on a feed-forward network, and the image quality is superior even for extremely low sampling rate (as low as 10%) due to the data-driven nature of the method. We demonstrate that RefineGAN outperforms the state-of-the-art CS-MRI methods by a large margin in terms of both running time and image quality via evaluation using several open-source MRI databases.

Journal ArticleDOI
TL;DR: This special issue focuses on data-driven tomographic reconstruction and covers the whole workflow of medical imaging: from tomographic raw data/features to reconstructed images and then extracted diagnostic features/readings.
Abstract: Over past several years, machine learning, or more generally artificial intelligence, has generated overwhelming research interest and attracted unprecedented public attention. As tomographic imaging researchers, we share the excitement from our imaging perspective [item 1) in the Appendix], and organized this special issue dedicated to the theme of “Machine learning for image reconstruction.” This special issue is a sister issue of the special issue published in May 2016 of this journal with the theme “Deep learning in medical imaging” [item 2) in the Appendix]. While the previous special issue targeted medical image processing/analysis, this special issue focuses on data-driven tomographic reconstruction. These two special issues are highly complementary, since image reconstruction and image analysis are two of the main pillars for medical imaging. Together we cover the whole workflow of medical imaging: from tomographic raw data/features to reconstructed images and then extracted diagnostic features/readings.

Journal ArticleDOI
TL;DR: Computational times for DCNN are shorter than the most efficient edge detection algorithms, not considering the training process, and show significant promise for future adoption of DCNN methods for image-based damage detection in concrete.

Journal ArticleDOI
TL;DR: A Trunk-Branch Ensemble CNN model (TBE-CNN), which extracts complementary information from holistic face images and patches cropped around facial components, achieves state-of-the-art performance on three popular video face databases: PaSC, COX Face, and YouTube Faces.
Abstract: Human faces in surveillance videos often suffer from severe image blur, dramatic pose variations, and occlusion. In this paper, we propose a comprehensive framework based on Convolutional Neural Networks (CNN) to overcome challenges in video-based face recognition (VFR). First, to learn blur-robust face representations, we artificially blur training data composed of clear still images to account for a shortfall in real-world video training data. Using training data composed of both still images and artificially blurred data, CNN is encouraged to learn blur-insensitive features automatically. Second, to enhance robustness of CNN features to pose variations and occlusion, we propose a Trunk-Branch Ensemble CNN model (TBE-CNN), which extracts complementary information from holistic face images and patches cropped around facial components. TBE-CNN is an end-to-end model that extracts features efficiently by sharing the low- and middle-level convolutional layers between the trunk and branch networks. Third, to further promote the discriminative power of the representations learnt by TBE-CNN, we propose an improved triplet loss function. Systematic experiments justify the effectiveness of the proposed techniques. Most impressively, TBE-CNN achieves state-of-the-art performance on three popular video face databases: PaSC, COX Face, and YouTube Faces. With the proposed techniques, we also obtain the first place in the BTAS 2016 Video Person Recognition Evaluation.

Journal ArticleDOI
TL;DR: The aim of this paper is first to explore the performance of DL architectures for the RS hyperspectral data set classification and second to introduce a new 3-D DL approach that enables a joint spectral and spatial information process.
Abstract: Recently, a variety of approaches have been enriching the field of remote sensing (RS) image processing and analysis. Unfortunately, existing methods remain limited to the rich spatiospectral content of today’s large data sets. It would seem intriguing to resort to deep learning (DL)-based approaches at this stage with regard to their ability to offer accurate semantic interpretation of the data. However, the specificity introduced by the coexistence of spectral and spatial content in the RS data sets widens the scope of the challenges presented to adapt DL methods to these contexts. Therefore, the aim of this paper is first to explore the performance of DL architectures for the RS hyperspectral data set classification and second to introduce a new 3-D DL approach that enables a joint spectral and spatial information process. A set of 3-D schemes is proposed and evaluated. Experimental results based on well-known hyperspectral data sets demonstrate that the proposed method is able to achieve a better classification rate than state-of-the-art methods with lower computational costs.

Journal ArticleDOI
TL;DR: A relaxed version of PGD wherein gradient descent enforces measurement consistency, while a CNN recursively projects the solution closer to the space of desired reconstruction images and shows an improvement over total variation-based regularization, dictionary learning, and a state-of-the-art deep learning-based direct reconstruction technique.
Abstract: We present a new image reconstruction method that replaces the projector in a projected gradient descent (PGD) with a convolutional neural network (CNN). Recently, CNNs trained as image-to-image regressors have been successfully used to solve inverse problems in imaging. However, unlike existing iterative image reconstruction algorithms, these CNN-based approaches usually lack a feedback mechanism to enforce that the reconstructed image is consistent with the measurements. We propose a relaxed version of PGD wherein gradient descent enforces measurement consistency, while a CNN recursively projects the solution closer to the space of desired reconstruction images. We show that this algorithm is guaranteed to converge and, under certain conditions, converges to a local minimum of a non-convex inverse problem. Finally, we propose a simple scheme to train the CNN to act like a projector. Our experiments on sparse-view computed-tomography reconstruction show an improvement over total variation-based regularization, dictionary learning, and a state-of-the-art deep learning-based direct reconstruction technique.

Journal ArticleDOI
TL;DR: A new no-reference (NR) IQA model is developed and a robust image enhancement framework is established based on quality optimization, which can well enhance natural images, low-contrast images,Low-light images, and dehazed images.
Abstract: In this paper, we investigate into the problem of image quality assessment (IQA) and enhancement via machine learning. This issue has long attracted a wide range of attention in computational intelligence and image processing communities, since, for many practical applications, e.g., object detection and recognition, raw images are usually needed to be appropriately enhanced to raise the visual quality (e.g., visibility and contrast). In fact, proper enhancement can noticeably improve the quality of input images, even better than originally captured images, which are generally thought to be of the best quality. In this paper, we present two most important contributions. The first contribution is to develop a new no-reference (NR) IQA model. Given an image, our quality measure first extracts 17 features through analysis of contrast, sharpness, brightness and more, and then yields a measure of visual quality using a regression module, which is learned with big-data training samples that are much bigger than the size of relevant image data sets. The results of experiments on nine data sets validate the superiority and efficiency of our blind metric compared with typical state-of-the-art full-reference, reduced-reference and NA IQA methods. The second contribution is that a robust image enhancement framework is established based on quality optimization. For an input image, by the guidance of the proposed NR-IQA measure, we conduct histogram modification to successively rectify image brightness and contrast to a proper level. Thorough tests demonstrate that our framework can well enhance natural images, low-contrast images, low-light images, and dehazed images. The source code will be released at https://sites.google.com/site/guke198701/publications .

Journal ArticleDOI
Chang Min Hyun1, Hwa Pyung Kim1, Sung Min Lee1, Sungchul Lee1, Jin Keun Seo1 
TL;DR: In this article, a deep learning method for faster magnetic resonance imaging (MRI) by reducing k-space data with sub-Nyquist sampling strategies is presented. But the method is not suitable for image folding.
Abstract: This paper presents a deep learning method for faster magnetic resonance imaging (MRI) by reducing k-space data with sub-Nyquist sampling strategies and provides a rationale for why the proposed approach works well. Uniform subsampling is used in the time-consuming phase-encoding direction to capture high-resolution image information, while permitting the image-folding problem dictated by the Poisson summation formula. To deal with the localization uncertainty due to image folding, a small number of low-frequency k-space data are added. Training the deep learning net involves input and output images that are pairs of the Fourier transforms of the subsampled and fully sampled k-space data. Our experiments show the remarkable performance of the proposed method; only 29[Formula: see text] of the k-space data can generate images of high quality as effectively as standard MRI reconstruction with the fully sampled data.

Journal ArticleDOI
Xian Tao, Dapeng Zhang, Ma Wenzhi, Xilong Liu, De Xu 
TL;DR: This paper discusses the automatic detection of metallic defects with a twofold procedure that accurately localizes and classifies defects appearing in input images captured from real industrial environments using a novel cascaded autoencoder (CASAE) architecture.
Abstract: Automatic metallic surface defect inspection has received increased attention in relation to the quality control of industrial products. Metallic defect detection is usually performed against complex industrial scenarios, presenting an interesting but challenging problem. Traditional methods are based on image processing or shallow machine learning techniques, but these can only detect defects under specific detection conditions, such as obvious defect contours with strong contrast and low noise, at certain scales, or under specific illumination conditions. This paper discusses the automatic detection of metallic defects with a twofold procedure that accurately localizes and classifies defects appearing in input images captured from real industrial environments. A novel cascaded autoencoder (CASAE) architecture is designed for segmenting and localizing defects. The cascading network transforms the input defect image into a pixel-wise prediction mask based on semantic segmentation. The defect regions of segmented results are classified into their specific classes via a compact convolutional neural network (CNN). Metallic defects under various conditions can be successfully detected using an industrial dataset. The experimental results demonstrate that this method meets the robustness and accuracy requirements for metallic defect detection. Meanwhile, it can also be extended to other detection applications.

Journal ArticleDOI
TL;DR: This paper proposes to implement an adversarial autoencoder for the task of retinal vessel network synthesis, and uses the generated vessel trees as an intermediate stage for the generation of color retinal images, which is accomplished with a generative adversarial network.
Abstract: In medical image analysis applications, the availability of the large amounts of annotated data is becoming increasingly critical. However, annotated medical data is often scarce and costly to obtain. In this paper, we address the problem of synthesizing retinal color images by applying recent techniques based on adversarial learning. In this setting, a generative model is trained to maximize a loss function provided by a second model attempting to classify its output into real or synthetic. In particular, we propose to implement an adversarial autoencoder for the task of retinal vessel network synthesis. We use the generated vessel trees as an intermediate stage for the generation of color retinal images, which is accomplished with a generative adversarial network. Both models require the optimization of almost everywhere differentiable loss functions, which allows us to train them jointly. The resulting model offers an end-to-end retinal image synthesis system capable of generating as many retinal images as the user requires, with their corresponding vessel networks, by sampling from a simple probability distribution that we impose to the associated latent space. We show that the learned latent space contains a well-defined semantic structure, implying that we can perform calculations in the space of retinal images, e.g., smoothly interpolating new data points between two retinal images. Visual and quantitative results demonstrate that the synthesized images are substantially different from those in the training set, while being also anatomically consistent and displaying a reasonable visual quality.

Journal ArticleDOI
TL;DR: A deep neural network is presented that is specifically designed to provide high resolution 3-D images from restricted photoacoustic measurements to represent an iterative scheme and incorporates gradient information of the data fit to compensate for limited view artifacts.
Abstract: Recent advances in deep learning for tomographic reconstructions have shown great potential to create accurate and high quality images with a considerable speed up. In this paper, we present a deep neural network that is specifically designed to provide high resolution 3-D images from restricted photoacoustic measurements. The network is designed to represent an iterative scheme and incorporates gradient information of the data fit to compensate for limited view artifacts. Due to the high complexity of the photoacoustic forward operator, we separate training and computation of the gradient information. A suitable prior for the desired image structures is learned as part of the training. The resulting network is trained and tested on a set of segmented vessels from lung computed tomography scans and then applied to in-vivo photoacoustic measurement data.