scispace - formally typeset
Search or ask a question

Showing papers on "Real image published in 2018"


Proceedings ArticleDOI
Yuhua Chen1, Wen Li1, Luc Van Gool1
18 Jun 2018
TL;DR: This work proposes a new reality oriented adaptation approach for urban scene semantic segmentation by learning from synthetic data that takes advantage of the intrinsic spatial structure presented in urban scene images, and proposes a spatial-aware adaptation scheme to effectively align the distribution of two domains.
Abstract: Exploiting synthetic data to learn deep models has attracted increasing attention in recent years. However, the intrinsic domain difference between synthetic and real images usually causes a significant performance drop when applying the learned model to real world scenarios. This is mainly due to two reasons: 1) the model overfits to synthetic images, making the convolutional filters incompetent to extract informative representation for real images; 2) there is a distribution difference between synthetic and real data, which is also known as the domain adaptation problem. To this end, we propose a new reality oriented adaptation approach for urban scene semantic segmentation by learning from synthetic data. First, we propose a target guided distillation approach to learn the real image style, which is achieved by training the segmentation model to imitate a pretrained real style model using real images. Second, we further take advantage of the intrinsic spatial structure presented in urban scene images, and propose a spatial-aware adaptation scheme to effectively align the distribution of two domains. These two modules can be readily integrated with existing state-of-the-art semantic segmentation networks to improve their generalizability when adapting from synthetic to real urban scenes. We evaluate the proposed method on Cityscapes dataset by adapting from GTAV and SYNTHIA datasets, where the results demonstrate the effectiveness of our method.

346 citations


Journal ArticleDOI
TL;DR: This work presents a new deep learning approach to blending for IBR, in which held-out real image data is used to learn blending weights to combine input photo contributions, and designs the network architecture and the training loss to provide high quality novel view synthesis, while reducing temporal flickering artifacts.
Abstract: Free-viewpoint image-based rendering (IBR) is a standing challenge. IBR methods combine warped versions of input photos to synthesize a novel view. The image quality of this combination is directly affected by geometric inaccuracies of multi-view stereo (MVS) reconstruction and by view- and image-dependent effects that produce artifacts when contributions from different input views are blended. We present a new deep learning approach to blending for IBR, in which we use held-out real image data to learn blending weights to combine input photo contributions. Our Deep Blending method requires us to address several challenges to achieve our goal of interactive free-viewpoint IBR navigation. We first need to provide sufficiently accurate geometry so the Convolutional Neural Network (CNN) can succeed in finding correct blending weights. We do this by combining two different MVS reconstructions with complementary accuracy vs. completeness tradeoffs. To tightly integrate learning in an interactive IBR system, we need to adapt our rendering algorithm to produce a fixed number of input layers that can then be blended by the CNN. We generate training data with a variety of captured scenes, using each input photo as ground truth in a held-out approach. We also design the network architecture and the training loss to provide high quality novel view synthesis, while reducing temporal flickering artifacts. Our results demonstrate free-viewpoint IBR in a wide variety of scenes, clearly surpassing previous methods in visual quality, especially when moving far from the input cameras.

265 citations


Proceedings ArticleDOI
18 Jun 2018
TL;DR: SfSNet produces significantly better quantitative and qualitative results than state-of-the-art methods for inverse rendering and independent normal and illumination estimation and is designed to reflect a physical lambertian rendering model.
Abstract: We present SfSNet, an end-to-end learning framework for producing an accurate decomposition of an unconstrained human face image into shape, reflectance and illuminance. SfSNet is designed to reflect a physical lambertian rendering model. SfSNet learns from a mixture of labeled synthetic and unlabeled real world images. This allows the network to capture low frequency variations from synthetic and high frequency details from real images through the photometric reconstruction loss. SfSNet consists of a new decomposition architecture with residual blocks that learns a complete separation of albedo and normal. This is used along with the original image to predict lighting. SfSNet produces significantly better quantitative and qualitative results than state-of-the-art methods for inverse rendering and independent normal and illumination estimation.

256 citations


Proceedings ArticleDOI
18 Jun 2018
TL;DR: In this article, a user can place a texture patch on a sketch at arbitrary locations and scales to control the desired output texture, and a generative network learns to synthesize objects consistent with these texture suggestions.
Abstract: In this paper, we investigate deep image synthesis guided by sketch, color, and texture. Previous image synthesis methods can be controlled by sketch and color strokes but we are the first to examine texture control. We allow a user to place a texture patch on a sketch at arbitrary locations and scales to control the desired output texture. Our generative network learns to synthesize objects consistent with these texture suggestions. To achieve this, we develop a local texture loss in addition to adversarial and content loss to train the generative network. We conduct experiments using sketches generated from real images and textures sampled from a separate texture database and results show that our proposed algorithm is able to generate plausible images that are faithful to user controls. Ablation studies show that our proposed pipeline can generate more realistic images than adapting existing methods directly.

224 citations


Posted Content
TL;DR: A forensic embedding based on a novel autoencoder-based architecture that can be used to distinguish between real and fake imagery is learned, which acts as a form of anomaly detector and shows significant improvements in transferability.
Abstract: Distinguishing manipulated from real images is becoming increasingly difficult as new sophisticated image forgery approaches come out by the day. Naive classification approaches based on Convolutional Neural Networks (CNNs) show excellent performance in detecting image manipulations when they are trained on a specific forgery method. However, on examples from unseen manipulation approaches, their performance drops significantly. To address this limitation in transferability, we introduce Forensic-Transfer (FT). We devise a learning-based forensic detector which adapts well to new domains, i.e., novel manipulation methods and can handle scenarios where only a handful of fake examples are available during training. To this end, we learn a forensic embedding based on a novel autoencoder-based architecture that can be used to distinguish between real and fake imagery. The learned embedding acts as a form of anomaly detector; namely, an image manipulated from an unseen method will be detected as fake provided it maps sufficiently far away from the cluster of real images. Comparing to prior works, FT shows significant improvements in transferability, which we demonstrate in a series of experiments on cutting-edge benchmarks. For instance, on unseen examples, we achieve up to 85% in terms of accuracy, and with only a handful of seen examples, our performance already reaches around 95%.

200 citations


Book ChapterDOI
08 Sep 2018
TL;DR: A simple trick is shown that is sufficient to train very effectively modern object detectors with synthetic images only: freeze the layers responsible for feature extraction to generic layers pre-trained on real images, and train only the remaining layers with plain OpenGL rendering.
Abstract: Deep Learning methods usually require huge amounts of training data to perform at their full potential, and often require expensive manual labeling. Using synthetic images is therefore very attractive to train object detectors, as the labeling comes for free, and several approaches have been proposed to combine synthetic and real images for training. In this paper, we evaluate if ‘freezing’ the layers responsible for feature extraction to generic layers pre-trained on real images, and training only the remaining layers with plain OpenGL rendering may allow for training with synthetic images only. Our experiments with very recent deep architectures for object recognition (Faster-RCNN, R-FCN, Mask-RCNN) and image feature extractors (InceptionResnet and Resnet) show this simple approach performs surprisingly well.

189 citations


Journal ArticleDOI
TL;DR: A spatial-neighbor-based noise filter is developed to further reduce the false alarms and missing detections using belief functions theory and to improve the robustness of detection with respect to the noise and heterogeneousness (modality difference) of images.
Abstract: The change detection in heterogeneous remote sensing images remains an important and open problem for damage assessment. We propose a new change detection method for heterogeneous images (i.e., SAR and optical images) based on homogeneous pixel transformation (HPT). HPT transfers one image from its original feature space (e.g., gray space) to another space (e.g., spectral space) in pixel-level to make the pre-event and post-event images represented in a common space for the convenience of change detection. HPT consists of two operations, i.e., the forward transformation and the backward transformation. In forward transformation, for each pixel of pre-event image in the first feature space, we will estimate its mapping pixel in the second space corresponding to post-event image based on the known unchanged pixels. A multi-value estimation method with noise tolerance is introduced to determine the mapping pixel using $K$ -nearest neighbors technique. Once the mapping pixels of pre-event image are available, the difference values between the mapping image and the post-event image can be directly calculated. After that, we will similarly do the backward transformation to associate the post-event image with the first space, and one more difference value for each pixel will be obtained. Then, the two difference values are combined to improve the robustness of detection with respect to the noise and heterogeneousness (modality difference) of images. Fuzzy-c means clustering algorithm is employed to divide the integrated difference values into two clusters: changed pixels and unchanged pixels. This detection results may contain some noisy regions (i.e., small error detections), and we develop a spatial-neighbor-based noise filter to further reduce the false alarms and missing detections using belief functions theory. The experiments for change detection with real images (e.g., SPOT, ERS, and NDVI) during a flood in U.K. are given to validate the effectiveness of the proposed method.

171 citations


Book ChapterDOI
08 Sep 2018
TL;DR: A novel GAN-based SISR method that overcomes the limitation and produces more realistic results by attaching an additional discriminator that works in the feature domain and design a new generator that utilizes long-range skip connections so that information between distant layers can be transferred more effectively.
Abstract: Generative adversarial networks (GANs) have recently been adopted to single image super-resolution (SISR) and showed impressive results with realistically synthesized high-frequency textures. However, the results of such GAN-based approaches tend to include less meaningful high-frequency noise that is irrelevant to the input image. In this paper, we propose a novel GAN-based SISR method that overcomes the limitation and produces more realistic results by attaching an additional discriminator that works in the feature domain. Our additional discriminator encourages the generator to produce structural high-frequency features rather than noisy artifacts as it distinguishes synthetic and real images in terms of features. We also design a new generator that utilizes long-range skip connections so that information between distant layers can be transferred more effectively. Experiments show that our method achieves the state-of-the-art performance in terms of both PSNR and perceptual quality compared to recent GAN-based methods.

166 citations


Book ChapterDOI
08 Sep 2018
TL;DR: A framework that comprises an image translation network for enhancing realism of input images, followed by a depth prediction network that can be trained end-to-end, leading to good results, even surpassing early deep-learning methods that use real paired data.
Abstract: Current methods for single-image depth estimation use training datasets with real image-depth pairs or stereo pairs, which are not easy to acquire. We propose a framework, trained on synthetic image-depth pairs and unpaired real images, that comprises an image translation network for enhancing realism of input images, followed by a depth prediction network. A key idea is having the first network act as a wide-spectrum input translator, taking in either synthetic or real images, and ideally producing minimally modified realistic images. This is done via a reconstruction loss when the training input is real, and GAN loss when synthetic, removing the need for heuristic self-regularization. The second network is trained on a task loss for synthetic image-depth pairs, with extra GAN loss to unify real and synthetic feature distributions. Importantly, the framework can be trained end-to-end, leading to good results, even surpassing early deep-learning methods that use real paired data.

164 citations


Proceedings ArticleDOI
01 Jun 2018
TL;DR: In this article, a graph-cut RANSAC (GC-RANSAC) algorithm is proposed to separate inliers and outliers in the local optimization step, which is applied when a so-far-the-best model is found.
Abstract: A novel method for robust estimation, called Graph-Cut RANSAC1, GC-RANSAC in short, is introduced. To separate inliers and outliers, it runs the graph-cut algorithm in the local optimization (LO) step which is applied when a so-far-the-best model is found. The proposed LO step is conceptually simple, easy to implement, globally optimal and efficient. GC-RANSAC is shown experimentally, both on synthesized tests and real image pairs, to be more geometrically accurate than state-of-the-art methods on a range of problems, e.g. line fitting, homography, affine transformation, fundamental and essential matrix estimation. It runs in real-time for many problems at a speed approximately equal to that of the less accurate alternatives (in milliseconds on standard CPU).

159 citations


Proceedings ArticleDOI
18 Jun 2018
TL;DR: This paper proposes a simple and efficient method for exploiting synthetic images when training a Deep Network to predict a 3D pose from an image, and shows that it performs very well in practice, and inference is faster and more accurate than with an exemplar-based approach.
Abstract: We propose a simple and efficient method for exploiting synthetic images when training a Deep Network to predict a 3D pose from an image. The ability of using synthetic images for training a Deep Network is extremely valuable as it is easy to create a virtually infinite training set made of such images, while capturing and annotating real images can be very cumbersome. However, synthetic images do not resemble real images exactly, and using them for training can result in suboptimal performance. It was recently shown that for exemplar-based approaches, it is possible to learn a mapping from the exemplar representations of real images to the exemplar representations of synthetic images. In this paper, we show that this approach is more general, and that a network can also be applied after the mapping to infer a 3D pose: At run-time, given a real image of the target object, we first compute the features for the image, map them to the feature space of synthetic images, and finally use the resulting features as input to another network which predicts the 3D pose. Since this network can be trained very effectively by using synthetic images, it performs very well in practice, and inference is faster and more accurate than with an exemplar-based approach. We demonstrate our approach on the LINEMOD dataset for 3D object pose estimation from color images, and the NYU dataset for 3D hand pose estimation from depth maps. We show that it allows us to outperform the state-of-the-art on both datasets.

Book ChapterDOI
08 Sep 2018
TL;DR: A drastically different way to handle synthetic images that does not require seeing any real images at training time is introduced, which builds on the observation that foreground and background classes are not affected in the same manner by the domain shift, and thus should be treated differently.
Abstract: Training a deep network to perform semantic segmentation requires large amounts of labeled data. To alleviate the manual effort of annotating real images, researchers have investigated the use of synthetic data, which can be labeled automatically. Unfortunately, a network trained on synthetic data performs relatively poorly on real images. While this can be addressed by domain adaptation, existing methods all require having access to real images during training. In this paper, we introduce a drastically different way to handle synthetic images that does not require seeing any real images at training time. Our approach builds on the observation that foreground and background classes are not affected in the same manner by the domain shift, and thus should be treated differently. In particular, the former should be handled in a detection-based manner to better account for the fact that, while their texture in synthetic images is not photo-realistic, their shape looks natural. Our experiments evidence the effectiveness of our approach on Cityscapes and CamVid with models trained on synthetic data only.

Book ChapterDOI
08 Sep 2018
TL;DR: This work proposes a novel loss function, i.e., Conservative Loss, which penalizes the extreme good and bad cases while encouraging the moderate examples and enables the network to learn features that are discriminative by gradient descent and are invariant to the change of domains via gradient ascend method.
Abstract: Due to the expensive and time-consuming annotations (e.g., segmentation) for real-world images, recent works in computer vision resort to synthetic data. However, the performance on the real image often drops significantly because of the domain shift between the synthetic data and the real images. In this setting, domain adaptation brings an appealing option. The effective approaches of domain adaptation shape the representations that (1) are discriminative for the main task and (2) have good generalization capability for domain shift. To this end, we propose a novel loss function, i.e., Conservative Loss, which penalizes the extreme good and bad cases while encouraging the moderate examples. More specifically, it enables the network to learn features that are discriminative by gradient descent and are invariant to the change of domains via gradient ascend method. Extensive experiments on synthetic to real segmentation adaptation show our proposed method achieves state of the art results. Ablation studies give more insights into properties of the Conservative Loss. Exploratory experiments and discussion demonstrate that our Conservative Loss has good flexibility rather than restricting an exact form.

Book ChapterDOI
16 Sep 2018
TL;DR: In this paper, an active learning (AL) framework is proposed to select most informative samples and add to the training data, which is able to achieve state-of-the-art performance by using about 35% of the full dataset.
Abstract: Training robust deep learning (DL) systems for medical image classification or segmentation is challenging due to limited images covering different disease types and severity. We propose an active learning (AL) framework to select most informative samples and add to the training data. We use conditional generative adversarial networks (cGANs) to generate realistic chest xray images with different disease characteristics by conditioning its generation on a real image sample. Informative samples to add to the training set are identified using a Bayesian neural network. Experiments show our proposed AL framework is able to achieve state of the art performance by using about \(35\%\) of the full dataset, thus saving significant time and effort over conventional methods.

Journal ArticleDOI
TL;DR: This study presents a new image-based indoor localization method using building information modeling (BIM) and convolutional neural networks (CNNs) that constructs a dataset with rendered BIM images and searches the dataset for images most similar to indoor photographs, thereby estimating the indoor position and orientation of the photograph.

Posted Content
TL;DR: This paper proposes a simple and effective approach, the Event-based Double Integral (EDI) model, to reconstruct a high frame-rate, sharp video from a single blurry frame and its event data, based on solving a simple non-convex optimization problem in a single scalar variable.
Abstract: Event-based cameras can measure intensity changes (called `{\it events}') with microsecond accuracy under high-speed motion and challenging lighting conditions. With the active pixel sensor (APS), the event camera allows simultaneous output of the intensity frames. However, the output images are captured at a relatively low frame-rate and often suffer from motion blur. A blurry image can be regarded as the integral of a sequence of latent images, while the events indicate the changes between the latent images. Therefore, we are able to model the blur-generation process by associating event data to a latent image. In this paper, we propose a simple and effective approach, the \textbf{Event-based Double Integral (EDI)} model, to reconstruct a high frame-rate, sharp video from a single blurry frame and its event data. The video generation is based on solving a simple non-convex optimization problem in a single scalar variable. Experimental results on both synthetic and real images demonstrate the superiority of our EDI model and optimization method in comparison to the state-of-the-art.

Proceedings ArticleDOI
01 Sep 2018
TL;DR: A fully supervised deep network is proposed which learns to jointly estimate a full 3D hand mesh representation and pose from a single depth image to improve model based learning (hybrid) methods' results on two of the public benchmarks.
Abstract: Articulated hand pose and shape estimation is an important problem for vision-based applications such as augmented reality and animation.In contrast to the existing methods which optimize only for joint positions, we propose a fully supervised deep network which learns to jointly estimate a full 3D hand mesh representation and pose from a single depth image.To this end, a CNN architecture is employed to estimate parametric representations i.e. hand pose, bone scales and complex shape parameters. Then, a novel hand pose and shape layer, embedded inside our deep framework, produces 3D joint positions and hand mesh. Lack of sufficient training data with varying hand shapes limits the generalized performance of learning based methods. Also, manually annotating real data is suboptimal. Therefore, we present SynHand5M: a million-scale synthetic benchmark with accurate joint annotations, segmentation masks and mesh files of depth maps. Among model based learning (hybrid) methods, we show improved results on two of the public benchmarks i.e. NYU and ICVL. Also, by employing a joint training strategy with real and synthetic data, we recover 3D hand mesh and pose from real images in 30ms.

Proceedings ArticleDOI
12 Mar 2018
TL;DR: Deep Generative Correlation Alignment Network (DGCAN) as discussed by the authors leverages shape preserving loss and a low level statistic matching loss to minimize the domain discrepancy between synthetic and real images in deep feature space.
Abstract: Synthetic images rendered from 3D CAD models are useful for augmenting training data for object recognition algorithms. However, the generated images are nonphotorealistic and do not match real image statistics. This leads to a large domain discrepancy, causing models trained on synthetic data to perform poorly on real domains. Recent work has shown the great potential of deep convolutional neural networks to generate realistic images, but has not utilized generative models to address synthetictoreal domain adaptation. In this work, we propose a Deep Generative Correlation Alignment Network (DGCAN) to synthesize images using a novel domain adaption algorithm. DGCAN leverages a shape preserving loss and a low level statistic matching loss to minimize the domain discrepancy between synthetic and real images in deep feature space. Experimentally, we show training off-the-shelf classifiers on the newly generated data can significantly boost performance when testing on the real image domains (PASCAL VOC 2007 benchmark and Office dataset), improving upon several existing methods.

Posted Content
TL;DR: This paper shows that by applying a straightforward modification to an existing photorealistic style transfer algorithm, it is shown that this approach exceeds the performance of any current state-of-the-art GAN-based image translation approach as measured by segmentation and object detection metrics.
Abstract: Deep neural networks have largely failed to effectively utilize synthetic data when applied to real images due to the covariate shift problem. In this paper, we show that by applying a straightforward modification to an existing photorealistic style transfer algorithm, we achieve state-of-the-art synthetic-to-real domain adaptation results. We conduct extensive experimental validations on four synthetic-to-real tasks for semantic segmentation and object detection, and show that our approach exceeds the performance of any current state-of-the-art GAN-based image translation approach as measured by segmentation and object detection metrics. Furthermore we offer a distance based analysis of our method which shows a dramatic reduction in Frechet Inception distance between the source and target domains, offering a quantitative metric that demonstrates the effectiveness of our algorithm in bridging the synthetic-to-real gap.

Proceedings ArticleDOI
02 Mar 2018
TL;DR: In this article, an unsupervised synthesis of T1-weighted brain MRI using a Generative Adversarial Network (GAN) by learning from 528 examples of 2D axial slices of brain MRI was proposed.
Abstract: An important task in image processing and neuroimaging is to extract quantitative information from the acquired images in order to make observations about the presence of disease or markers of development in populations. Having a low-dimensional manifold of an image allows for easier statistical comparisons between groups and the synthesis of group representatives. Previous studies have sought to identify the best mapping of brain MRI to a low-dimensional manifold, but have been limited by assumptions of explicit similarity measures. In this work, we use deep learning techniques to investigate implicit manifolds of normal brains and generate new, high-quality images. We explore implicit manifolds by addressing the problems of image synthesis and image denoising as important tools in manifold learning. First, we propose the unsupervised synthesis of T1-weighted brain MRI using a Generative Adversarial Network (GAN) by learning from 528 examples of 2D axial slices of brain MRI. Synthesized images were first shown to be unique by performing a cross-correlation with the training set. Real and synthesized images were then assessed in a blinded manner by two imaging experts providing an image quality score of 1-5. The quality score of the synthetic image showed substantial overlap with that of the real images. Moreover, we use an autoencoder with skip connections for image denoising, showing that the proposed method results in higher PSNR than FSL SUSAN after denoising. This work shows the power of artificial networks to synthesize realistic imaging data, which can be used to improve image processing techniques and provide a quantitative framework to structural changes in the brain.

Journal ArticleDOI
TL;DR: The results show that the proposed method outperforms state-of-the-art methods numerically and visually and is the first report of numerical evaluation of line segment detection on real images.
Abstract: This paper proposes a method for line segment detection in digital images We propose a novel linelet-based representation to model intrinsic properties of line segments in rasterized image space Based on this, line segment detection, validation, and aggregation frameworks are constructed For a numerical evaluation on real images, we propose a new benchmark dataset of real images with annotated lines called YorkUrban-LineSegment The results show that the proposed method outperforms state-of-the-art methods numerically and visually To our best knowledge, this is the first report of numerical evaluation of line segment detection on real images

Book ChapterDOI
08 Sep 2018
TL;DR: A deep learning-based method to generate full 3D hair geometry from an unconstrained image and demonstrates the effectiveness and robustness of the method on a wide range of challenging real Internet pictures, and shows reconstructed hair sequences from videos.
Abstract: We introduce a deep learning-based method to generate full 3D hair geometry from an unconstrained image Our method can recover local strand details and has real-time performance State-of-the-art hair modeling techniques rely on large hairstyle collections for nearest neighbor retrieval and then perform ad-hoc refinement Our deep learning approach, in contrast, is highly efficient in storage and can run 1000 times faster while generating hair with 30K strands The convolutional neural network takes the 2D orientation field of a hair image as input and generates strand features that are evenly distributed on the parameterized 2D scalp We introduce a collision loss to synthesize more plausible hairstyles, and the visibility of each strand is also used as a weight term to improve the reconstruction accuracy The encoder-decoder architecture of our network naturally provides a compact and continuous representation for hairstyles, which allows us to interpolate naturally between hairstyles We use a large set of rendered synthetic hair models to train our network Our method scales to real images because an intermediate 2D orientation field, automatically calculated from the real image, factors out the difference between synthetic and real hairs We demonstrate the effectiveness and robustness of our method on a wide range of challenging real Internet pictures, and show reconstructed hair sequences from videos

Journal ArticleDOI
TL;DR: The derived experimental results demonstrate the superior performance of the proposed framework in providing an accurate 3D model, especially when dealing with acquired UAV images containing repetitive pattern and significant image distortions.
Abstract: Accurate 3D reconstruction/modelling from unmanned aerial vehicle (UAV)-based imagery has become the key prerequisite in various applications. Although current commercial software has automated the process of image-based reconstruction, a transparent system, which can be incorporated with different user-defined constraints, is still preferred by the photogrammetric research community. In this regard, this paper presents a transparent framework for the automated aerial triangulation of UAV images. The proposed framework is conducted in three steps. In the first step, two approaches, which take advantage of prior information regarding the flight trajectory, are implemented for reliable relative orientation recovery. Then, initial recovery of image exterior orientation parameters (EOPs) is achieved through either an incremental or global approach. Finally, a global bundle adjustment involving Ground Control Points (GCPs) and check points is carried out to refine all estimated parameters in the defined mapping coordinate system. Four real image datasets, which are acquired by two different UAV platforms, have been utilized to evaluate the feasibility of the proposed framework. In addition, a comparative analysis between the proposed framework and the existing commercial software is performed. The derived experimental results demonstrate the superior performance of the proposed framework in providing an accurate 3D model, especially when dealing with acquired UAV images containing repetitive pattern and significant image distortions.

Journal ArticleDOI
TL;DR: This paper presents the first strategy to deal with optical images characterized by dissimilar spatial and spectral resolutions, and an experimental protocol is specifically designed, relying on synthetic yet physically plausible change rules applied to real images.
Abstract: Change detection (CD) is one of the most challenging issues when analyzing remotely sensed images. Comparing several multidate images acquired through the same kind of sensor is the most common scenario. Conversely, designing robust, flexible, and scalable algorithms for CD becomes even more challenging when the images have been acquired by two different kinds of sensors. This situation arises in the case of emergency under critical constraints. This paper presents, to the best of our knowledge, the first strategy to deal with optical images characterized by dissimilar spatial and spectral resolutions. Typical considered scenarios include CD between panchromatic, multispectral, and hyperspectral images. The proposed strategy consists of a three-step procedure: 1) inferring a high spatial and spectral resolution image by fusion of the two observed images characterized one by a low spatial resolution and the other by a low spectral resolution; 2) predicting two images with, respectively, the same spatial and spectral resolutions as the observed images by the degradation of the fused one; and 3) implementing a decision rule to each pair of observed and predicted images characterized by the same spatial and spectral resolutions to identify changes. To quantitatively assess the performance of the method, an experimental protocol is specifically designed, relying on synthetic yet physically plausible change rules applied to real images. The accuracy of the proposed framework is finally illustrated on real images.

Posted Content
TL;DR: DU-drive as discussed by the authors is an unsupervised real-to-virtual domain unification framework for endto-end autonomous driving, which first transforms real driving data to its less complex counterpart in the virtual domain and then predicts vehicle control commands from the generated virtual image.
Abstract: In the spectrum of vision-based autonomous driving, vanilla end-to-end models are not interpretable and suboptimal in performance, while mediated perception models require additional intermediate representations such as segmentation masks or detection bounding boxes, whose annotation can be prohibitively expensive as we move to a larger scale. More critically, all prior works fail to deal with the notorious domain shift if we were to merge data collected from different sources, which greatly hinders the model generalization ability. In this work, we address the above limitations by taking advantage of virtual data collected from driving simulators, and present DU-drive, an unsupervised real-to-virtual domain unification framework for end-to-end autonomous driving. It first transforms real driving data to its less complex counterpart in the virtual domain and then predicts vehicle control commands from the generated virtual image. Our framework has three unique advantages: 1) it maps driving data collected from a variety of source distributions into a unified domain, effectively eliminating domain shift; 2) the learned virtual representation is simpler than the input real image and closer in form to the "minimum sufficient statistic" for the prediction task, which relieves the burden of the compression phase while optimizing the information bottleneck tradeoff and leads to superior prediction performance; 3) it takes advantage of annotated virtual data which is unlimited and free to obtain. Extensive experiments on two public driving datasets and two driving simulators demonstrate the performance superiority and interpretive capability of DU-drive.

Posted Content
TL;DR: This work uses conditional generative adversarial networks (cGANs) to generate realistic chest xray images with different disease characteristics by conditioning its generation on a real image sample.
Abstract: Training robust deep learning (DL) systems for medical image classification or segmentation is challenging due to limited images covering different disease types and severity. We propose an active learning (AL) framework to select most informative samples and add to the training data. We use conditional generative adversarial networks (cGANs) to generate realistic chest xray images with different disease characteristics by conditioning its generation on a real image sample. Informative samples to add to the training set are identified using a Bayesian neural network. Experiments show our proposed AL framework is able to achieve state of the art performance by using about 35% of the full dataset, thus saving significant time and effort over conventional methods.

Journal ArticleDOI
TL;DR: A general framework to improve the fuzzy clustering based noisy image segmentation by integrating the guided filter in a new way and it is proved that the memberships post-processed by guided filter still retain the property usually required by fuzzy clustered: for each data point, the sum of its memberships is one.

Proceedings ArticleDOI
TL;DR: In this paper, an unsupervised synthesis of T1-weighted brain MRI using a Generative Adversarial Network (GAN) by learning from 528 examples of 2D axial slices of brain MRI was proposed.
Abstract: An important task in image processing and neuroimaging is to extract quantitative information from the acquired images in order to make observations about the presence of disease or markers of development in populations. Having a lowdimensional manifold of an image allows for easier statistical comparisons between groups and the synthesis of group representatives. Previous studies have sought to identify the best mapping of brain MRI to a low-dimensional manifold, but have been limited by assumptions of explicit similarity measures. In this work, we use deep learning techniques to investigate implicit manifolds of normal brains and generate new, high-quality images. We explore implicit manifolds by addressing the problems of image synthesis and image denoising as important tools in manifold learning. First, we propose the unsupervised synthesis of T1-weighted brain MRI using a Generative Adversarial Network (GAN) by learning from 528 examples of 2D axial slices of brain MRI. Synthesized images were first shown to be unique by performing a crosscorrelation with the training set. Real and synthesized images were then assessed in a blinded manner by two imaging experts providing an image quality score of 1-5. The quality score of the synthetic image showed substantial overlap with that of the real images. Moreover, we use an autoencoder with skip connections for image denoising, showing that the proposed method results in higher PSNR than FSL SUSAN after denoising. This work shows the power of artificial networks to synthesize realistic imaging data, which can be used to improve image processing techniques and provide a quantitative framework to structural changes in the brain.

Journal ArticleDOI
TL;DR: A super multi-view (SMV) technique is applied to near-eye displays to solve the vergence-accommodation conflict that causes visual fatigue and a comparison of full-parallax and horizontal parallax SMV images provided.
Abstract: A super multi-view (SMV) technique is applied to near-eye displays to solve the vergence–accommodation conflict that causes visual fatigue. The proposed SMV near-eye display employs a high-speed spatial light modulator (SLM), a two-dimensional (2D) light source array, and an imaging optics for each eye. The imaging optics produces a virtual image of the SLM and real images of the light sources to generate a 2D array of viewpoints. The SMV images are generated using a time-multiplexing technique: the multiple light sources sequentially emit light while the SLM synchronously displays corresponding parallax images. A monocular experimental system was constructed using a ferroelectric liquid crystal display and an LED array. A full-parallax SMV image generation with 21 viewpoints was demonstrated and a comparison of full-parallax and horizontal parallax SMV images provided.

Journal ArticleDOI
TL;DR: The results show that the proposed technique has greater accuracy and consistency in measuring crack width compared with the conventional technique and residual noise is removed through reclassification of crack regions based on image-adaptive thresholding.
Abstract: This paper describes an image-based methodology for the detection of structural cracks in concrete. The conventional approach based on line enhancement filtering has problems associated with an inaccurate extraction of edge pixels. We therefore propose an edge-based crack detection technique consisting of five steps: crack width transform, aspect ratio filtering, crack region search, hole filling, and relative thresholding. In the first step, crack width transform, opposing edges are identified and classified as crack candidate pixels, and a width map is generated. In the second step, aspect ratio filtering, the width map generated is used to remove noise. In the next two steps, the method searches for and restores missing pixels. In the last step, relative thresholding, residual noise is removed through reclassification of crack regions based on image-adaptive thresholding. The performance of this technique was tested using synthetic and real images. The results show that the proposed technique has greater accuracy and consistency in measuring crack width compared with the conventional technique.