scispace - formally typeset
Search or ask a question

Showing papers on "Upsampling published in 2017"


Journal ArticleDOI
TL;DR: Quantitative assessments show that SegNet provides good performance with competitive inference time and most efficient inference memory-wise as compared to other architectures, including FCN and DeconvNet.
Abstract: We present a novel and practical deep fully convolutional neural network architecture for semantic pixel-wise segmentation termed SegNet. This core trainable segmentation engine consists of an encoder network, a corresponding decoder network followed by a pixel-wise classification layer. The architecture of the encoder network is topologically identical to the 13 convolutional layers in the VGG16 network [1] . The role of the decoder network is to map the low resolution encoder feature maps to full input resolution feature maps for pixel-wise classification. The novelty of SegNet lies is in the manner in which the decoder upsamples its lower resolution input feature map(s). Specifically, the decoder uses pooling indices computed in the max-pooling step of the corresponding encoder to perform non-linear upsampling. This eliminates the need for learning to upsample. The upsampled maps are sparse and are then convolved with trainable filters to produce dense feature maps. We compare our proposed architecture with the widely adopted FCN [2] and also with the well known DeepLab-LargeFOV [3] , DeconvNet [4] architectures. This comparison reveals the memory versus accuracy trade-off involved in achieving good segmentation performance. SegNet was primarily motivated by scene understanding applications. Hence, it is designed to be efficient both in terms of memory and computational time during inference. It is also significantly smaller in the number of trainable parameters than other competing architectures and can be trained end-to-end using stochastic gradient descent. We also performed a controlled benchmark of SegNet and other architectures on both road scenes and SUN RGB-D indoor scene segmentation tasks. These quantitative assessments show that SegNet provides good performance with competitive inference time and most efficient inference memory-wise as compared to other architectures. We also provide a Caffe implementation of SegNet and a web demo at http://mi.eng.cam.ac.uk/projects/segnet/ .

13,468 citations


Proceedings ArticleDOI
12 Apr 2017
TL;DR: In this paper, the Laplacian pyramid super-resolution network (LapSRN) is proposed to progressively reconstruct the sub-band residuals of high-resolution images.
Abstract: Convolutional neural networks have recently demonstrated high-quality reconstruction for single-image super-resolution. In this paper, we propose the Laplacian Pyramid Super-Resolution Network (LapSRN) to progressively reconstruct the sub-band residuals of high-resolution images. At each pyramid level, our model takes coarse-resolution feature maps as input, predicts the high-frequency residuals, and uses transposed convolutions for upsampling to the finer level. Our method does not require the bicubic interpolation as the pre-processing step and thus dramatically reduces the computational complexity. We train the proposed LapSRN with deep supervision using a robust Charbonnier loss function and achieve high-quality reconstruction. Furthermore, our network generates multi-scale predictions in one feed-forward pass through the progressive reconstruction, thereby facilitates resource-aware applications. Extensive quantitative and qualitative evaluations on benchmark datasets show that the proposed algorithm performs favorably against the state-of-the-art methods in terms of speed and accuracy.

1,651 citations


Posted Content
TL;DR: This paper proposes the Laplacian Pyramid Super-Resolution Network (LapSRN) to progressively reconstruct the sub-band residuals of high-resolution images and generates multi-scale predictions in one feed-forward pass through the progressive reconstruction, thereby facilitates resource-aware applications.
Abstract: Convolutional neural networks have recently demonstrated high-quality reconstruction for single-image super-resolution. In this paper, we propose the Laplacian Pyramid Super-Resolution Network (LapSRN) to progressively reconstruct the sub-band residuals of high-resolution images. At each pyramid level, our model takes coarse-resolution feature maps as input, predicts the high-frequency residuals, and uses transposed convolutions for upsampling to the finer level. Our method does not require the bicubic interpolation as the pre-processing step and thus dramatically reduces the computational complexity. We train the proposed LapSRN with deep supervision using a robust Charbonnier loss function and achieve high-quality reconstruction. Furthermore, our network generates multi-scale predictions in one feed-forward pass through the progressive reconstruction, thereby facilitates resource-aware applications. Extensive quantitative and qualitative evaluations on benchmark datasets show that the proposed algorithm performs favorably against the state-of-the-art methods in terms of speed and accuracy.

1,417 citations


Proceedings ArticleDOI
21 Jul 2017
TL;DR: In this article, the authors extend DenseNets to semantic segmentation and achieve state-of-the-art results on urban scene benchmark datasets such as CamVid and Gatech, without any further post-processing module nor pretraining.
Abstract: State-of-the-art approaches for semantic image segmentation are built on Convolutional Neural Networks (CNNs). The typical segmentation architecture is composed of (a) a downsampling path responsible for extracting coarse semantic features, followed by (b) an upsampling path trained to recover the input image resolution at the output of the model and, optionally, (c) a post-processing module (e.g. Conditional Random Fields) to refine the model predictions.,,,,,, Recently, a new CNN architecture, Densely Connected Convolutional Networks (DenseNets), has shown excellent results on image classification tasks. The idea of DenseNets is based on the observation that if each layer is directly connected to every other layer in a feed-forward fashion then the network will be more accurate and easier to train.,,,,,, In this paper, we extend DenseNets to deal with the problem of semantic segmentation. We achieve state-of-the-art results on urban scene benchmark datasets such as CamVid and Gatech, without any further post-processing module nor pretraining. Moreover, due to smart construction of the model, our approach has much less parameters than currently published best entries for these datasets.

1,163 citations


Proceedings ArticleDOI
01 Oct 2017
TL;DR: A novel single-image super-resolution method is presented by introducing dense skip connections in a very deep network, providing an effective way to combine the low-level features and high- level features to boost the reconstruction performance.
Abstract: Recent studies have shown that the performance of single-image super-resolution methods can be significantly boosted by using deep convolutional neural networks. In this study, we present a novel single-image super-resolution method by introducing dense skip connections in a very deep network. In the proposed network, the feature maps of each layer are propagated into all subsequent layers, providing an effective way to combine the low-level features and high-level features to boost the reconstruction performance. In addition, the dense skip connections in the network enable short paths to be built directly from the output to each layer, alleviating the vanishing-gradient problem of very deep networks. Moreover, deconvolution layers are integrated into the network to learn the upsampling filters and to speedup the reconstruction process. Further, the proposed method substantially reduces the number of parameters, enhancing the computational efficiency. We evaluate the proposed method using images from four benchmark datasets and set a new state of the art.

1,079 citations


Posted Content
TL;DR: Zhang et al. as mentioned in this paper design dense upsampling convolution (DUC) to generate pixel-level prediction, which is able to capture and decode more detailed information that is generally missing in bilinear upampling.
Abstract: Recent advances in deep learning, especially deep convolutional neural networks (CNNs), have led to significant improvement over previous semantic segmentation systems. Here we show how to improve pixel-wise semantic segmentation by manipulating convolution-related operations that are of both theoretical and practical value. First, we design dense upsampling convolution (DUC) to generate pixel-level prediction, which is able to capture and decode more detailed information that is generally missing in bilinear upsampling. Second, we propose a hybrid dilated convolution (HDC) framework in the encoding phase. This framework 1) effectively enlarges the receptive fields (RF) of the network to aggregate global information; 2) alleviates what we call the "gridding issue" caused by the standard dilated convolution operation. We evaluate our approaches thoroughly on the Cityscapes dataset, and achieve a state-of-art result of 80.1% mIOU in the test set at the time of submission. We also have achieved state-of-the-art overall on the KITTI road estimation benchmark and the PASCAL VOC2012 segmentation task. Our source code can be found at this https URL .

589 citations


Proceedings ArticleDOI
01 Oct 2017
TL;DR: This paper proposes a simple yet effective sparse convolution layer which explicitly considers the location of missing data during the convolution operation, and demonstrates the benefits of the proposed network architecture in synthetic and real experiments with respect to various baseline approaches.
Abstract: In this paper, we consider convolutional neural networks operating on sparse inputs with an application to depth upsampling from sparse laser scan data. First, we show that traditional convolutional networks perform poorly when applied to sparse data even when the location of missing data is provided to the network. To overcome this problem, we propose a simple yet effective sparse convolution layer which explicitly considers the location of missing data during the convolution operation. We demonstrate the benefits of the proposed network architecture in synthetic and real experiments with respect to various baseline approaches. Compared to dense baselines, the proposed sparse convolution network generalizes well to novel datasets and is invariant to the level of sparsity in the data. For our evaluation, we derive a novel dataset from the KITTI benchmark, comprising 93k depth annotated RGB images. Our dataset allows for training and evaluating depth upsampling and depth prediction techniques in challenging real-world settings and will be made available upon publication.

518 citations


Proceedings ArticleDOI
01 Oct 2017
TL;DR: A novel deep fully convolutional network model for accurate salient object detection and an effective hybrid upsampling method to reduce the checkerboard artifacts of deconvolution operators in the authors' decoder network are proposed.
Abstract: Deep convolutional neural networks (CNNs) have delivered superior performance in many computer vision tasks. In this paper, we propose a novel deep fully convolutional network model for accurate salient object detection. The key contribution of this work is to learn deep uncertain convolutional features (UCF), which encourage the robustness and accuracy of saliency detection. We achieve this via introducing a reformulated dropout (R-dropout) after specific convolutional layers to construct an uncertain ensemble of internal feature units. In addition, we propose an effective hybrid upsampling method to reduce the checkerboard artifacts of deconvolution operators in our decoder network. The proposed methods can also be applied to other deep convolutional networks. Compared with existing saliency detection methods, the proposed UCF model is able to incorporate uncertainties for more accurate object boundary inference. Extensive experiments demonstrate that our proposed saliency model performs favorably against state-ofthe-art approaches. The uncertain feature learning mechanism as well as the upsampling method can significantly improve performance on other pixel-wise vision tasks.

433 citations


Posted Content
TL;DR: Zhang et al. as discussed by the authors proposed a deep fully convolutional network model for accurate salient object detection, which is able to incorporate uncertainties for more accurate object boundary inference by introducing a reformulated dropout (R-dropout).
Abstract: Deep convolutional neural networks (CNNs) have delivered superior performance in many computer vision tasks. In this paper, we propose a novel deep fully convolutional network model for accurate salient object detection. The key contribution of this work is to learn deep uncertain convolutional features (UCF), which encourage the robustness and accuracy of saliency detection. We achieve this via introducing a reformulated dropout (R-dropout) after specific convolutional layers to construct an uncertain ensemble of internal feature units. In addition, we propose an effective hybrid upsampling method to reduce the checkerboard artifacts of deconvolution operators in our decoder network. The proposed methods can also be applied to other deep convolutional networks. Compared with existing saliency detection methods, the proposed UCF model is able to incorporate uncertainties for more accurate object boundary inference. Extensive experiments demonstrate that our proposed saliency model performs favorably against state-of-the-art approaches. The uncertain feature learning mechanism as well as the upsampling method can significantly improve performance on other pixel-wise vision tasks.

267 citations


Posted Content
TL;DR: In this article, the location of missing data is considered in the convolutional layer of the network and a simple sparse convolution layer is proposed for depth upsampling from sparse laser scan data.
Abstract: In this paper, we consider convolutional neural networks operating on sparse inputs with an application to depth upsampling from sparse laser scan data. First, we show that traditional convolutional networks perform poorly when applied to sparse data even when the location of missing data is provided to the network. To overcome this problem, we propose a simple yet effective sparse convolution layer which explicitly considers the location of missing data during the convolution operation. We demonstrate the benefits of the proposed network architecture in synthetic and real experiments with respect to various baseline approaches. Compared to dense baselines, the proposed sparse convolution network generalizes well to novel datasets and is invariant to the level of sparsity in the data. For our evaluation, we derive a novel dataset from the KITTI benchmark, comprising 93k depth annotated RGB images. Our dataset allows for training and evaluating depth upsampling and depth prediction techniques in challenging real-world settings and will be made available upon publication.

236 citations


Journal ArticleDOI
TL;DR: A new strategy to increase the speed of FSI by two orders of magnitude is reported, which binarize the Fourier basis patterns based on upsampling and error diffusion dithering to find broad imaging applications at wavebands that are not accessible using conventional two-dimensional image sensors.
Abstract: Fourier single-pixel imaging (FSI) employs Fourier basis patterns for encoding spatial information and is capable of reconstructing high-quality two-dimensional and three-dimensional images. Fourier-domain sparsity in natural scenes allows FSI to recover sharp images from undersampled data. The original FSI demonstration, however, requires grayscale Fourier basis patterns for illumination. This requirement imposes a limitation on the imaging speed as digital micro-mirror devices (DMDs) generate grayscale patterns at a low refreshing rate. In this paper, we report a new strategy to increase the speed of FSI by two orders of magnitude. In this strategy, we binarize the Fourier basis patterns based on upsampling and error diffusion dithering. We demonstrate a 20,000 Hz projection rate using a DMD and capture 256-by-256-pixel dynamic scenes at a speed of 10 frames per second. The reported technique substantially accelerates image acquisition speed of FSI. It may find broad imaging applications at wavebands that are not accessible using conventional two-dimensional image sensors.

Journal ArticleDOI
TL;DR: This paper shows that using bicubic interpolated depth map can blur depth discontinuities when the upsampling factor is large and the input depth map contains large holes and heavy noise, and proposes a robust optimization framework for color guided depth map restoration that performs well in suppressing texture copy artifacts.
Abstract: One of the most challenging issues in color guided depth map restoration is the inconsistency between color edges in guidance color images and depth discontinuities on depth maps. This makes the restored depth map suffer from texture copy artifacts and blurring depth discontinuities. To handle this problem, most state-of-the-art methods design complex guidance weight based on guidance color images and heuristically make use of the bicubic interpolation of the input depth map. In this paper, we show that using bicubic interpolated depth map can blur depth discontinuities when the upsampling factor is large and the input depth map contains large holes and heavy noise. In contrast, we propose a robust optimization framework for color guided depth map restoration. By adopting a robust penalty function to model the smoothness term of our model, we show that the proposed method is robust against the inconsistency between color edges and depth discontinuities even when we use simple guidance weight. To the best of our knowledge, we are the first to solve this problem with a principled mathematical formulation rather than previous heuristic weighting schemes. The proposed robust method performs well in suppressing texture copy artifacts. Moreover, it can better preserve sharp depth discontinuities than previous heuristic weighting schemes. Through comprehensive experiments on both simulated data and real data, we show promising performance of the proposed method.

Proceedings Article
12 Feb 2017
TL;DR: An end-to-end transformative discriminative neural network devised for super-resolving unaligned and very small face images with an extreme upscaling factor of 8.5 and significantly outperforms the state-of-the-art.
Abstract: Conventional face hallucination methods rely heavily on accurate alignment of low-resolution (LR) faces before upsampling them. Misalignment often leads to deficient results and unnatural artifacts for large upscaling factors. However, due to the diverse range of poses and different facial expressions, aligning an LR input image, in particular when it is tiny, is severely difficult. To overcome this challenge, here we present an end-to-end transformative discriminative neural network (TDN) devised for super-resolving unaligned and very small face images with an extreme upscaling factor of 8. Our method employs an upsampling network where we embed spatial transformation layers to allow local receptive fields to line-up with similar spatial supports. Furthermore, we incorporate a class-specific loss in our objective through a successive discriminative network to improve the alignment and upsampling performance with semantic information. Extensive experiments on large face datasets show that the proposed method significantly outperforms the state-of-the-art.

Proceedings ArticleDOI
01 Jul 2017
TL;DR: A weighted analysis representation model for guided depth image enhancement is proposed, which advances the conventional methods in two aspects: task driven learning and dynamic guidance.
Abstract: The depth images acquired by consumer depth sensors (e.g., Kinect and ToF) usually are of low resolution and insufficient quality. One natural solution is to incorporate with high resolution RGB camera for exploiting their statistical correlation. However, most existing methods are intuitive and limited in characterizing the complex and dynamic dependency between intensity and depth images. To address these limitations, we propose a weighted analysis representation model for guided depth image enhancement, which advances the conventional methods in two aspects: (i) task driven learning and (ii) dynamic guidance. First, we generalize the analysis representation model by including a guided weight function for dependency modeling. And the task-driven learning formulation is introduced to obtain the optimized guidance tailored to specific enhancement task. Second, the depth image is gradually enhanced along with the iterations, and thus the guidance should also be dynamically adjusted to account for the updating of depth image. To this end, stage-wise parameters are learned for dynamic guidance. Experiments on guided depth image upsampling and noisy depth image restoration validate the effectiveness of our method.

Proceedings ArticleDOI
01 Jul 2017
TL;DR: The proposed architecture, the Recursively Branched Deconvolutional Network (RBDN), develops a cheap multi-context image representation very early on using an efficient recursive branching scheme with extensive parameter sharing and learnable upsampling.
Abstract: We present a Deep Convolutional Neural Network architecture which serves as a generic image-to-image regressor that can be trained end-to-end without any further machinery. Our proposed architecture, the Recursively Branched Deconvolutional Network (RBDN), develops a cheap multi-context image representation very early on using an efficient recursive branching scheme with extensive parameter sharing and learnable upsampling. This multi-context representation is subjected to a highly non-linear locality preserving transformation by the remainder of our network comprising of a series of convolutions/deconvolutions without any spatial downsampling. The RBDN architecture is fully convolutional and can handle variable sized images during inference. We provide qualitative/quantitative results on 3 diverse tasks: relighting, denoising and colorization and show that our proposed RBDN architecture obtains comparable results to the state-of-the-art on each of these tasks when used off-the-shelf without any post processing or task-specific architectural modifications.

Proceedings ArticleDOI
18 Jul 2017
TL;DR: In this paper, the authors present an extensive comparison of a variety of decoders for pixel-wise prediction tasks and identify two decoder types which give a consistently high performance.
Abstract: Many machine vision applications require predictions for every pixel of the input image (for example semantic segmentation, boundary detection). Models for such problems usually consist of encoders which decreases spatial resolution while learning a high-dimensional representation, followed by decoders who recover the original input resolution and result in low-dimensional predictions. While encoders have been studied rigorously, relatively few studies address the decoder side. Therefore this paper presents an extensive comparison of a variety of decoders for a variety of pixel-wise prediction tasks. Our contributions are: (1) Decoders matter: we observe significant variance in results between different types of decoders on various problems. (2) We introduce a novel decoder: bilinear additive upsampling. (3) We introduce new residual-like connections for decoders. (4) We identify two decoder types which give a consistently high performance.

Journal ArticleDOI
TL;DR: Experimental results show that robustness is achieved by recovering satisfactory watermark data from the reconstructed cover image after applying common geometric transformation attacks, common enhancement technique attacks (like lowpass filtering, histogram equalization, sharpening, gamma correction, noise addition etc.) and JPEG compression attacks.
Abstract: To compromise between imperceptibility and robustness property of robust image watermarking technique, a RDWT-DCT based blind image watermarking scheme using Arnold scrambling is presented in this paper. Firstly, RDWT (Redundant Discrete Wavelet Transform) is applied to each gray scale cover image block after the image is decomposed into fixed size non overlapping blocks. Secondly, the binary watermark logo is encrypted by Arnold chaotic map and reshaped to a sequence to improve the security of the logo. In the subsequent step, DCT (Discrete Cosine Transform) is employed on each LH subband of the non-overlapping host image block. Finally, after zigzag scanning of each DCT block a binary bit of watermark is embedded into each block by adjusting some middle significant AC coefficients using repetition code. Experimental results show that robustness is achieved by recovering satisfactory watermark data from the reconstructed cover image after applying common geometric transformation attacks (like rotation, cropping, scaling, shearing and deletion of lines or column operation etc.), common enhancement technique attacks (like lowpass filtering, histogram equalization, sharpening, gamma correction, noise addition etc.) and JPEG compression attacks. The proposed scheme is also tested to verify the robustness performance against standard benchmark software "Checkmark" and satisfactory results are achieved against the Checkmark attacks such as Hard and Soft Thresholding, Template Removal, Warping, Dithering, Remodulation and Downsampling/Upsampling etc.

Journal ArticleDOI
TL;DR: Numerical simulations and experimental measurements show that sparse deconvolution can be considered as an effective tool for terahertz nondestructive characterization of multilayered structures.
Abstract: Terahertz sparse deconvolution based on an iterative shrinkage algorithm is presented in this study to characterize multilayered structures. With an upsampling approach, sparse deconvolution with superresolution is developed to overcome the time resolution limited by the sampling period in the measurement and increase the precision of the estimation of echo arrival times. A simple but effective time-domain model for describing the temporal pulse spreading due to the frequency-dependent loss is also designed and introduced into the algorithm, which greatly improves the performance of sparse deconvolution in processing time-varying pulses during the propagation of terahertz waves in materials. Numerical simulations and experimental measurements verify the algorithms and show that sparse deconvolution can be considered as an effective tool for terahertz nondestructive characterization of multilayered structures.

Posted Content
18 Jul 2017
TL;DR: In this article, the authors present an extensive comparison of a variety of decoders for pixel-wise prediction tasks and identify two decoder types which give a consistently high performance.
Abstract: Many machine vision applications require predictions for every pixel of the input image (for example semantic segmentation, boundary detection). Models for such problems usually consist of encoders which decreases spatial resolution while learning a high-dimensional representation, followed by decoders who recover the original input resolution and result in low-dimensional predictions. While encoders have been studied rigorously, relatively few studies address the decoder side. Therefore this paper presents an extensive comparison of a variety of decoders for a variety of pixel-wise prediction tasks. Our contributions are: (1) Decoders matter: we observe significant variance in results between different types of decoders on various problems. (2) We introduce a novel decoder: bilinear additive upsampling. (3) We introduce new residual-like connections for decoders. (4) We identify two decoder types which give a consistently high performance.

Journal ArticleDOI
TL;DR: This work proposes a multiframe super-resolution reconstruction technique based on sparse representation of MR images that can reduce through-plane partial volume artifact by combining multiple orthogonal MR scans, and thus can potentially improve medical image analysis, research, and clinical diagnosis.
Abstract: In magnetic resonance (MR), hardware limitations, scan time constraints, and patient movement often result in the acquisition of anisotropic 3-D MR images with limited spatial resolution in the out-of-plane views. Our goal is to construct an isotropic high-resolution (HR) 3-D MR image through upsampling and fusion of orthogonal anisotropic input scans. We propose a multiframe super-resolution (SR) reconstruction technique based on sparse representation of MR images. Our proposed algorithm exploits the correspondence between the HR slices and the low-resolution (LR) sections of the orthogonal input scans as well as the self-similarity of each input scan to train pairs of overcomplete dictionaries that are used in a sparse-land local model to upsample the input scans. The upsampled images are then combined using wavelet fusion and error backprojection to reconstruct an image. Features are learned from the data and no extra training set is needed. Qualitative and quantitative analyses were conducted to evaluate the proposed algorithm using simulated and clinical MR scans. Experimental results show that the proposed algorithm achieves promising results in terms of peak signal-to-noise ratio, structural similarity image index, intensity profiles, and visualization of small structures obscured in the LR imaging process due to partial volume effects. Our novel SR algorithm outperforms the nonlocal means (NLM) method using self-similarity, NLM method using self-similarity and image prior, self-training dictionary learning-based SR method, averaging of upsampled scans, and the wavelet fusion method. Our SR algorithm can reduce through-plane partial volume artifact by combining multiple orthogonal MR scans, and thus can potentially improve medical image analysis, research, and clinical diagnosis.

Patent
10 Mar 2017
TL;DR: In this article, a system and method for semantic segmentation using dense upsampling convolution (DUC) is described, in which the label map is divided into equal subparts, which have the same height and width as the feature map.
Abstract: A system and method for semantic segmentation using dense upsampling convolution (DUC) are disclosed. A particular embodiment includes: receiving an input image; producing a feature map from the input image; performing a convolution operation on the feature map and reshape the feature map to produce a label map; dividing the label map into equal subparts, which have the same height and width as the feature map; stacking the subparts of the label map to produce a whole label map; and applying a convolution operation directly between the feature map and the whole label map without inserting extra values in deconvolutional layers to produce a semantic label map.

Proceedings ArticleDOI
01 Jan 2017
TL;DR: In this paper, a downsampled version of the input signal is used to generate the output signal through upsampling, which yields the best known audio-only performance on the RECOLA dataset.
Abstract: The goal of continuous emotion recognition is to assign an emotion value to every frame in a sequence of acoustic features. We show that incorporating long-term temporal dependencies is critical for continuous emotion recognition tasks. To this end, we first investigate architectures that use dilated convolutions. We show that even though such architectures outperform previously reported systems, the output signals produced from such architectures undergo erratic changes between consecutive time steps. This is inconsistent with the slow moving ground-truth emotion labels that are obtained from human annotators. To deal with this problem, we model a downsampled version of the input signal and then generate the output signal through upsampling. Not only does the resulting downsampling/upsampling network achieve good performance, it also generates smooth output trajectories. Our method yields the best known audio-only performance on the RECOLA dataset.

Posted Content
TL;DR: This work proposes a principled convolutional neural pyramid (CNP) framework for general low-level vision and image processing tasks based on the essential finding that many applications require large receptive fields for structure understanding.
Abstract: We propose a principled convolutional neural pyramid (CNP) framework for general low-level vision and image processing tasks. It is based on the essential finding that many applications require large receptive fields for structure understanding. But corresponding neural networks for regression either stack many layers or apply large kernels to achieve it, which is computationally very costly. Our pyramid structure can greatly enlarge the field while not sacrificing computation efficiency. Extra benefit includes adaptive network depth and progressive upsampling for quasi-realtime testing on VGA-size input. Our method profits a broad set of applications, such as depth/RGB image restoration, completion, noise/artifact removal, edge refinement, image filtering, image enhancement and colorization.

Proceedings ArticleDOI
12 Sep 2017
TL;DR: This method is based on an experimental investigation of the dependence between the QP threshold, which determines when to encode lower resolution frames, and the distortion obtained after downsampling/upsampling.
Abstract: In this paper, a novel spatial resolution adaptation approach for video compression is proposed. Its ability to dynamically apply downsampling to frames exhibiting low spatial detail delivers improved rate distortion performance, together with a reduction in computational complexity of the encoding process. This method is based on an experimental investigation of the dependence between the QP threshold, which determines when to encode lower resolution frames, and the distortion obtained after downsampling/upsampling. The proposed approach is integrated with the High Efficiency Video Coding (HEVC) reference codec for intra coding, and evaluated on 15 high-resolution test sequences with varying levels of spatial detail. The results show a promising average bitrate savings of approximately 4% (B-D measurements), and significant complexity reduction (29% on average).

Book ChapterDOI
10 Sep 2017
TL;DR: In this paper, a context-sensitive upsampling method based on a residual convolutional neural network model was proposed to learn organ specific appearance and adopts semantically to input data allowing for the generation of high resolution images with sharp edges and fine scale detail.
Abstract: 3D Magnetic Resonance Imaging (MRI) is often a trade-off between fast but low-resolution image acquisition and highly detailed but slow image acquisition. Fast imaging is required for targets that move to avoid motion artefacts. This is in particular difficult for fetal MRI. Spatially independent upsampling techniques, which are the state-of-the-art to address this problem, are error prone and disregard contextual information. In this paper we propose a context-sensitive upsampling method based on a residual convolutional neural network model that learns organ specific appearance and adopts semantically to input data allowing for the generation of high resolution images with sharp edges and fine scale detail. By making contextual decisions about appearance and shape, present in different parts of an image, we gain a maximum of structural detail at a similar contrast as provided by high-resolution data. We experiment on 145 fetal scans and show that our approach yields an increased PSNR of 1.25 dB when applied to under-sampled fetal data cf. baseline upsampling. Furthermore, our method yields an increased PSNR of 1.73 dB when utilizing under-sampled fetal data to perform brain volume reconstruction on motion corrupted captured data.

Proceedings ArticleDOI
17 Sep 2017
TL;DR: A novel and efficient method for brain tumor (and sub regions) segmentation in multimodal MR images based on a fully convolutional network (FCN) that enables end-to-end training and fast inference.
Abstract: In this paper, we present a novel and efficient method for brain tumor (and sub regions) segmentation in multimodal MR images based on a fully convolutional network (FCN) that enables end-to-end training and fast inference. Our structure consists of a downsampling path and three upsampling paths, which extract multi-level contextual information by concatenating hierarchical feature representation from each upsam-pling path. Meanwhile, we introduce a symmetry-driven FCN by the proposal of using symmetry difference images. The model was evaluated on Brain Tumor Image Segmentation Benchmark (BRATS) 2013 challenge dataset and achieved the state-of-the-art results while the computational cost is less than competitors.

Journal ArticleDOI
TL;DR: This paper proposes a new approach to address issues in a unified framework of depth map restoration, based on sparse representation, and suggests an alternative method of reconstructing dense depth map from very sparse non- uniformly sampled depth data by sequential cascading of uniform and non-uniform upsampling techniques.
Abstract: Depth map sensed by low-cost active sensor is often limited in resolution, whereas depth information achieved from structure from motion or sparse depth scanning techniques may result in a sparse point cloud. Achieving a high-resolution (HR) depth map from a low resolution (LR) depth map or densely reconstructing a sparse non-uniformly sampled depth map are fundamentally similar problems with different types of upsampling requirements. The first problem involves upsampling in a uniform grid, whereas the second type of problem requires an upsampling in a non-uniform grid. In this paper, we propose a new approach to address such issues in a unified framework, based on sparse representation. Unlike, most of the approaches of depth map restoration, our approach does not require an HR intensity image. Based on example depth maps, sub-dictionaries of exemplars are constructed, and are used to restore HR/dense depth map. In the case of uniform upsampling of LR depth map, an edge preserving constraint is used for preserving the discontinuity present in the depth map, and a pyramidal reconstruction strategy is applied in order to deal with higher upsampling factors. For upsampling of non-uniformly sampled sparse depth map, we compute the missing information in local patches from that from similar exemplars. Furthermore, we also suggest an alternative method of reconstructing dense depth map from very sparse non-uniformly sampled depth data by sequential cascading of uniform and non-uniform upsampling techniques. We provide a variety of qualitative and quantitative results to demonstrate the efficacy of our approach for depth map restoration.

Journal ArticleDOI
01 May 2017
TL;DR: This paper reviews the approaches that couple ToF depth images with high-resolution optical images and provides an overview of performance evaluation tests presented in the related studies.
Abstract: Recently, there has been remarkable growth of interest in the development and applications of time-of-flight (ToF) depth cameras. Despite the permanent improvement of their characteristics, the practical applicability of ToF cameras is still limited by low resolution and quality of depth measurements. This has motivated many researchers to combine ToF cameras with other sensors in order to enhance and upsample depth images. In this paper, we review the approaches that couple ToF depth images with high-resolution optical images. Other classes of upsampling methods are also briefly discussed. Finally, we provide an overview of performance evaluation tests presented in the related studies.

Book ChapterDOI
14 Sep 2017
TL;DR: This work proposes an architecture, based on the recently proposed Densenet, for semantic segmentation, in which pooling has been replaced with dilated convolutions, and presents results on the validation dataset of the Multimodal Brain Tumor Segmentation Challenge 2017.
Abstract: Segmentation of medical images requires multi-scale information, combining local boundary detection with global context. State-of-the-art convolutional neural network (CNN) architectures for semantic segmentation are often composed of a downsampling path which computes features at multiple scales, followed by an upsampling path, required to recover those features at the same scale as the input image. Skip connections allow features discovered in the downward path to be integrated in the upward path. The downsampling mechanism is typically a pooling operation. However, pooling was introduced in CNNs to enable translation invariance, which is not desirable in segmentation tasks. For this reason, we propose an architecture, based on the recently proposed Densenet, for semantic segmentation, in which pooling has been replaced with dilated convolutions. We also present a variant approach, used in the 2017 BRATS challenge, in which a cascade of densely connected nets is used to first exclude non-brain tissue, and then segment tumor structures. We present results on the validation dataset of the Multimodal Brain Tumor Segmentation Challenge 2017.

Journal ArticleDOI
TL;DR: This paper simulations Landsat scenes to evaluate a subpixel registration process based on phase correlation and the upsampling of the Fourier transform, and shows that image size affects the cross correlation results, but for images equal or larger than 100 × 100 pixels similar accuracies are expected.
Abstract: Multi-temporal analysis is one of the main applications of remote sensing, and Landsat imagery has been one of the main resources for many years. However, the moderate spatial resolution (30 m) restricts their use for high precision applications. In this paper, we simulate Landsat scenes to evaluate, by means of an exhaustive number of tests, a subpixel registration process based on phase correlation and the upsampling of the Fourier transform. From a high resolution image (0.5 m), two sets of 121 synthetic images of fixed translations are created to simulate Landsat scenes (30 m). In this sense, the use of the point spread function (PSF) of the Landsat TM (Thematic Mapper) sensor in the downsampling process improves the results compared to those obtained by simple averaging. In the process of obtaining sub-pixel accuracy by upsampling the cross correlation matrix by a certain factor, the limit of improvement is achieved at 0.1 pixels. We show that image size affects the cross correlation results, but for images equal or larger than 100 × 100 pixels similar accuracies are expected. The large dataset used in the tests allows us to describe the intra-pixel distribution of the errors obtained in the registration process and how they follow a waveform instead of random/stochastic behavior. The amplitude of this waveform, representing the highest expected error, is estimated at 1.88 m. Finally, a validation test is performed over a set of sub-pixel shorelines obtained from actual Landsat-5 TM, Landsat-7 ETM+ (Enhanced Thematic Mapper Plus) and Landsat-8 OLI (Operation Land Imager) scenes. The evaluation of the shoreline accuracy with respect to permanent seawalls, before and after the registration, shows the importance of the registering process and serves as a non-synthetic validation test that reinforce previous results.