scispace - formally typeset
Search or ask a question

Showing papers on "Pixel published in 2016"


Proceedings ArticleDOI
27 Jun 2016
TL;DR: This paper presents the first convolutional neural network capable of real-time SR of 1080p videos on a single K2 GPU and introduces an efficient sub-pixel convolution layer which learns an array of upscaling filters to upscale the final LR feature maps into the HR output.
Abstract: Recently, several models based on deep neural networks have achieved great success in terms of both reconstruction accuracy and computational performance for single image super-resolution. In these methods, the low resolution (LR) input image is upscaled to the high resolution (HR) space using a single filter, commonly bicubic interpolation, before reconstruction. This means that the super-resolution (SR) operation is performed in HR space. We demonstrate that this is sub-optimal and adds computational complexity. In this paper, we present the first convolutional neural network (CNN) capable of real-time SR of 1080p videos on a single K2 GPU. To achieve this, we propose a novel CNN architecture where the feature maps are extracted in the LR space. In addition, we introduce an efficient sub-pixel convolution layer which learns an array of upscaling filters to upscale the final LR feature maps into the HR output. By doing so, we effectively replace the handcrafted bicubic filter in the SR pipeline with more complex upscaling filters specifically trained for each feature map, whilst also reducing the computational complexity of the overall SR operation. We evaluate the proposed approach using images and videos from publicly available datasets and show that it performs significantly better (+0.15dB on Images and +0.39dB on Videos) and is an order of magnitude faster than previous CNN-based methods.

4,770 citations


Proceedings Article
19 Jun 2016
TL;DR: A deep neural network is presented that sequentially predicts the pixels in an image along the two spatial dimensions and encodes the complete set of dependencies in the image to achieve log-likelihood scores on natural images that are considerably better than the previous state of the art.
Abstract: Modeling the distribution of natural images is a landmark problem in unsupervised learning. This task requires an image model that is at once expressive, tractable and scalable. We present a deep neural network that sequentially predicts the pixels in an image along the two spatial dimensions. Our method models the discrete probability of the raw pixel values and encodes the complete set of dependencies in the image. Architectural novelties include fast two-dimensional recurrent layers and an effective use of residual connections in deep recurrent networks. We achieve log-likelihood scores on natural images that are considerably better than the previous state of the art. Our main results also provide benchmarks on the diverse ImageNet dataset. Samples generated from the model appear crisp, varied and globally coherent.

1,801 citations


Proceedings ArticleDOI
27 Jun 2016
TL;DR: This work proposes an algorithm, linear in the size of the image, deterministic and requires no training, that performs well on a wide variety of images and is competitive with other state-of-the-art methods on the single image dehazing problem.
Abstract: Haze limits visibility and reduces image contrast in outdoor images. The degradation is different for every pixel and depends on the distance of the scene point from the camera. This dependency is expressed in the transmission coefficients, that control the scene attenuation and amount of haze in every pixel. Previous methods solve the single image dehazing problem using various patch-based priors. We, on the other hand, propose an algorithm based on a new, non-local prior. The algorithm relies on the assumption that colors of a haze-free image are well approximated by a few hundred distinct colors, that form tight clusters in RGB space. Our key observation is that pixels in a given cluster are often non-local, i.e., they are spread over the entire image plane and are located at different distances from the camera. In the presence of haze these varying distances translate to different transmission coefficients. Therefore, each color cluster in the clear image becomes a line in RGB space, that we term a haze-line. Using these haze-lines, our algorithm recovers both the distance map and the haze-free image. The algorithm is linear in the size of the image, deterministic and requires no training. It performs well on a wide variety of images and is competitive with other stateof-the-art methods.

1,082 citations


Posted Content
TL;DR: In this paper, a deep neural network that sequentially predicts the pixels in an image along the two spatial dimensions is presented. But the model is not able to model the discrete probability of the raw pixel values and encodes the complete set of dependencies.
Abstract: Modeling the distribution of natural images is a landmark problem in unsupervised learning. This task requires an image model that is at once expressive, tractable and scalable. We present a deep neural network that sequentially predicts the pixels in an image along the two spatial dimensions. Our method models the discrete probability of the raw pixel values and encodes the complete set of dependencies in the image. Architectural novelties include fast two-dimensional recurrent layers and an effective use of residual connections in deep recurrent networks. We achieve log-likelihood scores on natural images that are considerably better than the previous state of the art. Our main results also provide benchmarks on the diverse ImageNet dataset. Samples generated from the model appear crisp, varied and globally coherent.

970 citations


Proceedings ArticleDOI
Di Lin, Jifeng Dai1, Jiaya Jia, Kaiming He1, Jian Sun1 
01 Jun 2016
TL;DR: Zhang et al. as discussed by the authors proposed to use scribbles to annotate images, and developed an algorithm to train convolutional networks for semantic segmentation supervised by scribbles.
Abstract: Large-scale data is of crucial importance for learning semantic segmentation models, but annotating per-pixel masks is a tedious and inefficient procedure. We note that for the topic of interactive image segmentation, scribbles are very widely used in academic research and commercial software, and are recognized as one of the most userfriendly ways of interacting. In this paper, we propose to use scribbles to annotate images, and develop an algorithm to train convolutional networks for semantic segmentation supervised by scribbles. Our algorithm is based on a graphical model that jointly propagates information from scribbles to unmarked pixels and learns network parameters. We present competitive object semantic segmentation results on the PASCAL VOC dataset by using scribbles as annotations. Scribbles are also favored for annotating stuff (e.g., water, sky, grass) that has no well-defined shape, and our method shows excellent results on the PASCALCONTEXT dataset thanks to extra inexpensive scribble annotations. Our scribble annotations on PASCAL VOC are available at http://research.microsoft.com/en-us/um/ people/jifdai/downloads/scribble_sup.

748 citations


Proceedings ArticleDOI
01 Jun 2016
TL;DR: This work introduces a linear approximation of the min operator to compute the dark channel and achieves state-of-the-art results on deblurring natural images and compares favorably methods that are well-engineered for specific scenarios.
Abstract: We present a simple and effective blind image deblurring method based on the dark channel prior. Our work is inspired by the interesting observation that the dark channel of blurred images is less sparse. While most image patches in the clean image contain some dark pixels, these pixels are not dark when averaged with neighboring highintensity pixels during the blur process. This change in the sparsity of the dark channel is an inherent property of the blur process, which we both prove mathematically and validate using training data. Therefore, enforcing the sparsity of the dark channel helps blind deblurring on various scenarios, including natural, face, text, and low-illumination images. However, sparsity of the dark channel introduces a non-convex non-linear optimization problem. We introduce a linear approximation of the min operator to compute the dark channel. Our look-up-table-based method converges fast in practice and can be directly extended to non-uniform deblurring. Extensive experiments show that our method achieves state-of-the-art results on deblurring natural images and compares favorably methods that are well-engineered for specific scenarios.

682 citations


Journal ArticleDOI
TL;DR: Some popular and state-of-the-art fusion methods in different levels especially at pixel level are reviewed and varied approaches and metrics for assessment of fused product are presented.

574 citations


Posted Content
TL;DR: The experiments show that the presented approach significantly outperforms state-of-the-art direct and indirect methods in a variety of real-world settings, both in terms of tracking accuracy and robustness.
Abstract: We propose a novel direct sparse visual odometry formulation. It combines a fully direct probabilistic model (minimizing a photometric error) with consistent, joint optimization of all model parameters, including geometry -- represented as inverse depth in a reference frame -- and camera motion. This is achieved in real time by omitting the smoothness prior used in other direct methods and instead sampling pixels evenly throughout the images. Since our method does not depend on keypoint detectors or descriptors, it can naturally sample pixels from across all image regions that have intensity gradient, including edges or smooth intensity variations on mostly white walls. The proposed model integrates a full photometric calibration, accounting for exposure time, lens vignetting, and non-linear response functions. We thoroughly evaluate our method on three different datasets comprising several hours of video. The experiments show that the presented approach significantly outperforms state-of-the-art direct and indirect methods in a variety of real-world settings, both in terms of tracking accuracy and robustness.

557 citations


Proceedings ArticleDOI
01 Jun 2016
TL;DR: This paper describes a local region in an image via hierarchical Gaussian distribution in which both means and covariances are included in their parameters and shows that the proposed descriptor exhibits remarkably high performance which outperforms the state-of-the-art descriptors for person re-identification.
Abstract: Describing the color and textural information of a person image is one of the most crucial aspects of person re-identification. In this paper, we present a novel descriptor based on a hierarchical distribution of pixel features. A hierarchical covariance descriptor has been successfully applied for image classification. However, the mean information of pixel features, which is absent in covariance, tends to be major discriminative information of person images. To solve this problem, we describe a local region in an image via hierarchical Gaussian distribution in which both means and covariances are included in their parameters. More specifically, we model the region as a set of multiple Gaussian distributions in which each Gaussian represents the appearance of a local patch. The characteristics of the set of Gaussians are again described by another Gaussian distribution. In both steps, unlike the hierarchical covariance descriptor, the proposed descriptor can model both the mean and the covariance information of pixel features properly. The results of experiments conducted on five databases indicate that the proposed descriptor exhibits re-markably high performance which outperforms the state-of-the-art descriptors for person re-identification.

554 citations


Proceedings ArticleDOI
27 Jun 2016
TL;DR: Zhang et al. as mentioned in this paper proposed an end-to-end deep contrast network consisting of two complementary components, a pixel-level fully convolutional stream and a segment-wise spatial pooling stream.
Abstract: Salient object detection has recently witnessed substantial progress due to powerful features extracted using deep convolutional neural networks (CNNs). However, existing CNN-based methods operate at the patch level instead of the pixel level. Resulting saliency maps are typically blurry, especially near the boundary of salient objects. Furthermore, image patches are treated as independent samples even when they are overlapping, giving rise to significant redundancy in computation and storage. In this paper, we propose an end-to-end deep contrast network to overcome the aforementioned limitations. Our deep network consists of two complementary components, a pixel-level fully convolutional stream and a segment-wise spatial pooling stream. The first stream directly produces a saliency map with pixel-level accuracy from an input image. The second stream extracts segment-wise features very efficiently, and better models saliency discontinuities along object boundaries. Finally, a fully connected CRF model can be optionally incorporated to improve spatial coherence and contour localization in the fused result from these two streams. Experimental results demonstrate that our deep model significantly improves the state of the art.

553 citations


Journal ArticleDOI
Qiaoliang Li1, Bowei Feng1, Linpei Xie1, Ping Liang1, Huisheng Zhang1, Tianfu Wang1 
TL;DR: A wide and deep neural network with strong induction ability is proposed to model the transformation, and an efficient training strategy is presented, where instead of a single label of the center pixel, the network can output the label map of all pixels for a given image patch.
Abstract: This paper presents a new supervised method for vessel segmentation in retinal images. This method remolds the task of segmentation as a problem of cross-modality data transformation from retinal image to vessel map. A wide and deep neural network with strong induction ability is proposed to model the transformation, and an efficient training strategy is presented. Instead of a single label of the center pixel, the network can output the label map of all pixels for a given image patch. Our approach outperforms reported state-of-the-art methods in terms of sensitivity, specificity and accuracy. The result of cross-training evaluation indicates its robustness to the training set. The approach needs no artificially designed feature and no preprocessing step, reducing the impact of subjective factors. The proposed method has the potential for application in image diagnosis of ophthalmologic diseases, and it may provide a new, general, high-performance computing framework for image segmentation.

Journal ArticleDOI
TL;DR: Shake-The-Box as discussed by the authors is a Lagrangian tracking method that uses a prediction of the particle distribution for the subsequent time-step as a mean to seize the temporal domain.
Abstract: A Lagrangian tracking method is introduced, which uses a prediction of the particle distribution for the subsequent time-step as a mean to seize the temporal domain. Errors introduced by the prediction process are corrected by an image matching technique (‘shaking’ the particle in space), followed by an iterative triangulation of particles newly entering the measurement domain. The scheme was termed ‘Shake-The-Box’ and previously characterized as ‘4D-PTV’ due to the strong interaction with the temporal dimension. Trajectories of tracer particles are identified at high spatial accuracy due to a nearly complete suppression of ghost particles; a temporal filtering scheme further improves on accuracy and allows for the extraction of local velocity and acceleration as derivatives of a continuous function. Exploiting the temporal information enables the processing of densely seeded flows (beyond 0.1 particles per pixel, ppp), which were previously reserved for tomographic PIV evaluations. While TOMO-PIV uses statistical means to evaluate the flow (building an ‘anonymous’ voxel space with subsequent spatial averaging of the velocity information using correlation), the Shake-The-Box approach is able to identify and track individual particles at numbers of tens or even hundreds of thousands per time-step. The method is outlined in detail, followed by descriptions of applications to synthetic and experimental data. The synthetic data evaluation reveals that STB is able to capture virtually all true particles, while effectively suppressing the formation of ghost particles. For the examined four-camera set-up particle image densities N I up to 0.125 ppp could be processed. For noise-free images, the attained accuracy is very high. The addition of synthetic noise reduces usable particle image density (N I ≤ 0.075 ppp for highly noisy images) and accuracy (still being significantly higher compared to tomographic reconstruction). The solutions remain virtually free of ghost particles. Processing an experimental data set on a transitional jet in water demonstrates the benefits of advanced Lagrangian evaluation in describing flow details—both on small scales (by the individual tracks) and on larger structures (using an interpolation onto an Eulerian grid). Comparisons to standard TOMO-PIV processing for synthetic and experimental evaluations show distinct benefits in local accuracy, completeness of the solution, ghost particle occurrence, spatial resolution, temporal coherence and computational effort.

Journal ArticleDOI
11 Nov 2016
TL;DR: A computational photography pipeline that captures, aligns, and merges a burst of frames to reduce noise and increase dynamic range, built atop Android's Camera2 API and written in the Halide domain-specific language (DSL).
Abstract: Cell phone cameras have small apertures, which limits the number of photons they can gather, leading to noisy images in low light. They also have small sensor pixels, which limits the number of electrons each pixel can store, leading to limited dynamic range. We describe a computational photography pipeline that captures, aligns, and merges a burst of frames to reduce noise and increase dynamic range. Our system has several key features that help make it robust and efficient. First, we do not use bracketed exposures. Instead, we capture frames of constant exposure, which makes alignment more robust, and we set this exposure low enough to avoid blowing out highlights. The resulting merged image has clean shadows and high bit depth, allowing us to apply standard HDR tone mapping methods. Second, we begin from Bayer raw frames rather than the demosaicked RGB (or YUV) frames produced by hardware Image Signal Processors (ISPs) common on mobile platforms. This gives us more bits per pixel and allows us to circumvent the ISP's unwanted tone mapping and spatial denoising. Third, we use a novel FFT-based alignment algorithm and a hybrid 2D/3D Wiener filter to denoise and merge the frames in a burst. Our implementation is built atop Android's Camera2 API, which provides per-frame camera control and access to raw imagery, and is written in the Halide domain-specific language (DSL). It runs in 4 seconds on device (for a 12 Mpix image), requires no user intervention, and ships on several mass-produced cell phones.

Journal ArticleDOI
TL;DR: In this article, a novel deep convolutional neural network (CNN) that is deeper and wider than other existing deep networks for hyperspectral image classification is proposed, which can optimally explore local contextual interactions by jointly exploiting local spatio-spectral relationships of neighboring individual pixel vectors.
Abstract: In this paper, we describe a novel deep convolutional neural network (CNN) that is deeper and wider than other existing deep networks for hyperspectral image classification. Unlike current state-of-the-art approaches in CNN-based hyperspectral image classification, the proposed network, called contextual deep CNN, can optimally explore local contextual interactions by jointly exploiting local spatio-spectral relationships of neighboring individual pixel vectors. The joint exploitation of the spatio-spectral information is achieved by a multi-scale convolutional filter bank used as an initial component of the proposed CNN pipeline. The initial spatial and spectral feature maps obtained from the multi-scale filter bank are then combined together to form a joint spatio-spectral feature map. The joint feature map representing rich spectral and spatial properties of the hyperspectral image is then fed through a fully convolutional network that eventually predicts the corresponding label of each pixel vector. The proposed approach is tested on three benchmark datasets: the Indian Pines dataset, the Salinas dataset and the University of Pavia dataset. Performance comparison shows enhanced classification performance of the proposed approach over the current state-of-the-art on the three datasets.

Journal ArticleDOI
TL;DR: A novel method for anomaly detection in hyperspectral images (HSIs) is proposed based on low-rank and sparse representation based on the separation of the background and the anomalies in the observed data.
Abstract: A novel method for anomaly detection in hyperspectral images (HSIs) is proposed based on low-rank and sparse representation. The proposed method is based on the separation of the background and the anomalies in the observed data. Since each pixel in the background can be approximately represented by a background dictionary and the representation coefficients of all pixels form a low-rank matrix, a low-rank representation is used to model the background part. To better characterize each pixel's local representation, a sparsity-inducing regularization term is added to the representation coefficients. Moreover, a dictionary construction strategy is adopted to make the dictionary more stable and discriminative. Then, the anomalies are determined by the response of the residual matrix. An important advantage of the proposed algorithm is that it combines the global and local structure in the HSI. Experimental results have been conducted using both simulated and real data sets. These experiments indicate that our algorithm achieves very promising anomaly detection performance.

Journal ArticleDOI
TL;DR: This paper models the messages embedded by spatial least significant bit (LSB) matching as independent noises to the cover image, and reveals that the histogram of the differences between pixel gray values is smoothed by the stego bits despite a large distance between the pixels.
Abstract: This paper models the messages embedded by spatial least significant bit (LSB) matching as independent noises to the cover image, and reveals that the histogram of the differences between pixel gray values is smoothed by the stego bits despite a large distance between the pixels Using the characteristic function of difference histogram (DHCF), we prove that the center of mass of DHCF (DHCF COM) decreases after messages are embedded Accordingly, the DHCF COMs are calculated as distinguishing features from the pixel pairs with different distances The features are calibrated with an image generated by average operation, and then used to train a support vector machine (SVM) classifier The experimental results prove that the features extracted from the differences between nonadjacent pixels can help to tackle LSB matching as well

Journal ArticleDOI
TL;DR: A new image feature called Normalized Pixel Difference (NPD) is proposed, computed as the difference to sum ratio between two pixel values, inspired by the Weber Fraction in experimental psychology, which is scale invariant, bounded, and able to reconstruct the original image.
Abstract: We propose a method to address challenges in unconstrained face detection, such as arbitrary pose variations and occlusions. First, a new image feature called Normalized Pixel Difference (NPD) is proposed. NPD feature is computed as the difference to sum ratio between two pixel values, inspired by the Weber Fraction in experimental psychology. The new feature is scale invariant, bounded, and is able to reconstruct the original image. Second, we propose a deep quadratic tree to learn the optimal subset of NPD features and their combinations, so that complex face manifolds can be partitioned by the learned rules. This way, only a single soft-cascade classifier is needed to handle unconstrained face detection. Furthermore, we show that the NPD features can be efficiently obtained from a look up table, and the detection template can be easily scaled, making the proposed face detector very fast. Experimental results on three public face datasets (FDDB, GENKI, and CMU-MIT) show that the proposed method achieves state-of-the-art performance in detecting unconstrained faces with arbitrary pose variations and occlusions in cluttered scenes.

Journal ArticleDOI
TL;DR: This study proves UAV based SfM has the potential to become a new standard for high-throughput phenotyping of in-field crop heights and provides a novel spatial mapping of crop height variation both at the field scale and also within individual plots.
Abstract: There is a growing need to increase global crop yields, whilst minimising use of resources such as land, fertilisers and water. Agricultural researchers use ground-based observations to identify, select and develop crops with favourable genotypes and phenotypes; however, the ability to collect rapid, high quality and high volume phenotypic data in open fields is restricting this. This study develops and assesses a method for deriving crop height and growth rate rapidly from multi-temporal, very high spatial resolution (1 cm/pixel), 3D digital surface models of crop field trials produced via Structure from Motion (SfM) photogrammetry using aerial imagery collected through repeated campaigns flying an Unmanned Aerial Vehicle (UAV) with a mounted Red Green Blue (RGB) camera. We compare UAV SfM modelled crop heights to those derived from terrestrial laser scanner (TLS) and to the standard field measurement of crop height conducted using a 2 m rule. The most accurate UAV-derived surface model and the TLS both achieve a Root Mean Squared Error (RMSE) of 0.03 m compared to the existing manual 2 m rule method. The optimised UAV method was then applied to the growing season of a winter wheat field phenotyping experiment containing 25 different varieties grown in 27 m2 plots and subject to four different nitrogen fertiliser treatments. Accuracy assessments at different stages of crop growth produced consistently low RMSE values (0.07, 0.02 and 0.03 m for May, June and July, respectively), enabling crop growth rate to be derived from differencing of the multi-temporal surface models. We find growth rates range from −13 mm/day to 17 mm/day. Our results clearly display the impact of variable nitrogen fertiliser rates on crop growth. Digital surface models produced provide a novel spatial mapping of crop height variation both at the field scale and also within individual plots. This study proves UAV based SfM has the potential to become a new standard for high-throughput phenotyping of in-field crop heights.

Posted Content
TL;DR: In this paper, the feature maps are extracted in the LR space and an efficient sub-pixel convolution layer is introduced to upscale the final LR feature maps into the HR output, which reduces the computational complexity of the overall SR operation.
Abstract: Recently, several models based on deep neural networks have achieved great success in terms of both reconstruction accuracy and computational performance for single image super-resolution. In these methods, the low resolution (LR) input image is upscaled to the high resolution (HR) space using a single filter, commonly bicubic interpolation, before reconstruction. This means that the super-resolution (SR) operation is performed in HR space. We demonstrate that this is sub-optimal and adds computational complexity. In this paper, we present the first convolutional neural network (CNN) capable of real-time SR of 1080p videos on a single K2 GPU. To achieve this, we propose a novel CNN architecture where the feature maps are extracted in the LR space. In addition, we introduce an efficient sub-pixel convolution layer which learns an array of upscaling filters to upscale the final LR feature maps into the HR output. By doing so, we effectively replace the handcrafted bicubic filter in the SR pipeline with more complex upscaling filters specifically trained for each feature map, whilst also reducing the computational complexity of the overall SR operation. We evaluate the proposed approach using images and videos from publicly available datasets and show that it performs significantly better (+0.15dB on Images and +0.39dB on Videos) and is an order of magnitude faster than previous CNN-based methods.

Journal ArticleDOI
TL;DR: This paper presents an algorithm for electron tomographic reconstruction and sparse image interpolation that exploits the nonlocal redundancy in images, and demonstrates that the algorithm produces higher quality reconstructions on both simulated and real electron microscope data, along with improved convergence properties compared to other methods.
Abstract: Many material and biological samples in scientific imaging are characterized by nonlocal repeating structures. These are studied using scanning electron microscopy and electron tomography. Sparse sampling of individual pixels in a two-dimensional image acquisition geometry, or sparse sampling of projection images with large tilt increments in a tomography experiment, can enable high speed data acquisition and minimize sample damage caused by the electron beam. In this paper, we present an algorithm for electron tomographic reconstruction and sparse image interpolation that exploits the nonlocal redundancy in images. We adapt a framework, termed plug-and-play priors, to solve these imaging problems in a regularized inversion setting. The power of the plug-and-play approach is that it allows a wide array of modern denoising algorithms to be used as a “prior model” for tomography and image interpolation. We also present sufficient mathematical conditions that ensure convergence of the plug-and-play approach, and we use these insights to design a new nonlocal means denoising algorithm. Finally, we demonstrate that the algorithm produces higher quality reconstructions on both simulated and real electron microscope data, along with improved convergence properties compared to other methods.

Proceedings ArticleDOI
27 Jun 2016
TL;DR: A new energy on the vertices of a regularly sampled spatiotemporal bilateral grid is designed, which can be solved efficiently using a standard graph cut label assignment, and implicitly approximates long-range, spatio-temporal connections between pixels while still containing only a small number of variables and only local graph edges.
Abstract: In this work, we propose a novel approach to video segmentation that operates in bilateral space. We design a new energy on the vertices of a regularly sampled spatiotemporal bilateral grid, which can be solved efficiently using a standard graph cut label assignment. Using a bilateral formulation, the energy that we minimize implicitly approximates long-range, spatio-temporal connections between pixels while still containing only a small number of variables and only local graph edges. We compare to a number of recent methods, and show that our approach achieves state-of-the-art results on multiple benchmarks in a fraction of the runtime. Furthermore, our method scales linearly with image size, allowing for interactive feedback on real-world high resolution video.

Book ChapterDOI
08 Oct 2016
TL;DR: An automatic image matting method for portrait images that does not need user interaction is proposed and achieves comparable results with state-of-the-art methods that require specified foreground and background regions or pixels.
Abstract: We propose an automatic image matting method for portrait images. This method does not need user interaction, which was however essential in most previous approaches. In order to accomplish this goal, a new end-to-end convolutional neural network (CNN) based framework is proposed taking the input of a portrait image. It outputs the matte result. Our method considers not only image semantic prediction but also pixel-level image matte optimization. A new portrait image dataset is constructed with our labeled matting ground truth. Our automatic method achieves comparable results with state-of-the-art methods that require specified foreground and background regions or pixels. Many applications are enabled given the automatic nature of our system.

Journal ArticleDOI
Puzhao Zhang1, Maoguo Gong1, Linzhi Su1, Jia Liu1, Li Zhizhou1 
TL;DR: This paper presents a novel multi-spatial-resolution change detection framework, which incorporates deep-architecture-based unsupervised feature learning and mapping-based feature change analysis, and tries to explore the inner relationships between them by building a mapping neural network.
Abstract: Multi-spatial-resolution change detection is a newly proposed issue and it is of great significance in remote sensing, environmental and land use monitoring, etc. Though multi-spatial-resolution image-pair are two kinds of representations of the same reality, they are often incommensurable superficially due to their different modalities and properties. In this paper, we present a novel multi-spatial-resolution change detection framework, which incorporates deep-architecture-based unsupervised feature learning and mapping-based feature change analysis. Firstly, we transform multi-resolution image-pair into the same pixel-resolution through co-registration, followed by details recovery, which is designed to remedy the spatial details lost in the registration. Secondly, the denoising autoencoder is stacked to learn local and high-level representation/feature from the local neighborhood of the given pixel, in an unsupervised fashion. Thirdly, motivated by the fact that multi-resolution image-pair share the same reality in the unchanged regions, we try to explore the inner relationships between them by building a mapping neural network. And it can be used to learn a mapping function based on the most-unlikely-changed feature-pairs, which are selected from all the feature-pairs via a coarse initial change map generated in advance. The learned mapping function can bridge the different representations and highlight changes. Finally, we can build a robust and contractive change map through feature similarity analysis, and the change detection result is obtained through the segmentation of the final change map. Experiments are carried out on four real datasets, and the results confirmed the effectiveness and superiority of the proposed method.

Journal ArticleDOI
12 Jan 2016-ACS Nano
TL;DR: A plasmonic filter set with polarization-switchable color properties, based upon arrays of asymmetric cross-shaped nanoapertures in an aluminum thin-film, used to create micro image displays containing duality in their optical information states.
Abstract: Color filters based upon nanostructured metals have garnered significant interest in recent years, having been positioned as alternatives to the organic dye-based filters which provide color selectivity in image sensors, as nonfading “printing” technologies for producing images with nanometer pixel resolution, and as ultra-high-resolution, small foot-print optical storage and encoding solutions. Here, we demonstrate a plasmonic filter set with polarization-switchable color properties, based upon arrays of asymmetric cross-shaped nanoapertures in an aluminum thin-film. Acting as individual color-emitting nanopixels, the plasmonic cavity-apertures have dual-color selectivity, transmitting one of two visible colors, controlled by the polarization of the white light incident on the rear of the pixel and tuned by varying the critical dimensions of the geometry and periodicity of the array. This structural approach to switchable optical filtering enables a single nanoaperture to encode two information states wi...

Proceedings ArticleDOI
01 Jun 2016
TL;DR: An approach to long-range spatio-temporal regularization in semantic video segmentation by optimizing the mapping of pixels to a Euclidean feature space so as to minimize distances between corresponding points.
Abstract: We present an approach to long-range spatio-temporal regularization in semantic video segmentation. Temporal regularization in video is challenging because both the camera and the scene may be in motion. Thus Euclidean distance in the space-time volume is not a good proxy for correspondence. We optimize the mapping of pixels to a Euclidean feature space so as to minimize distances between corresponding points. Structured prediction is performed by a dense CRF that operates on the optimized features. Experimental results demonstrate that the presented approach increases the accuracy and temporal consistency of semantic video segmentation.

Journal ArticleDOI
TL;DR: This letter presents a novel change detection method for multitemporal synthetic aperture radar images based on PCANet that exploits representative neighborhood features from each pixel using PCA filters as convolutional filters to generate change maps with less noise spots.
Abstract: This letter presents a novel change detection method for multitemporal synthetic aperture radar images based on PCANet. This method exploits representative neighborhood features from each pixel using PCA filters as convolutional filters. Thus, the proposed method is more robust to the speckle noise and can generate change maps with less noise spots. Given two multitemporal images, Gabor wavelets and fuzzy $c$ -means are utilized to select interested pixels that have high probability of being changed or unchanged. Then, new image patches centered at interested pixels are generated and a PCANet model is trained using these patches. Finally, pixels in the multitemporal images are classified by the trained PCANet model. The PCANet classification result and the preclassification result are combined to form the final change map. The experimental results obtained on three real SAR image data sets confirm the effectiveness of the proposed method.

Journal ArticleDOI
01 Feb 2016-Optik
TL;DR: A method for detection crack patterns in cement use image processing techniques and the advantage of this method is clearly and accurate detection of cracks in images.

Journal ArticleDOI
TL;DR: Experimental results and security analysis demonstrate that the proposed algorithm has a high security, fast speed and can resist various attacks.

Book ChapterDOI
08 Oct 2016
TL;DR: In this article, a novel family of "clockwork" convnets driven by fixed or adaptive clock signals is proposed to schedule the processing of different layers at different update rates according to their semantic stability.
Abstract: Recent years have seen tremendous progress in still-image segmentation; however the naive application of these state-of-the-art algorithms to every video frame requires considerable computation and ignores the temporal continuity inherent in video. We propose a video recognition framework that relies on two key observations: (1) while pixels may change rapidly from frame to frame, the semantic content of a scene evolves more slowly, and (2) execution can be viewed as an aspect of architecture, yielding purpose-fit computation schedules for networks. We define a novel family of “clockwork” convnets driven by fixed or adaptive clock signals that schedule the processing of different layers at different update rates according to their semantic stability. We design a pipeline schedule to reduce latency for real-time recognition and a fixed-rate schedule to reduce overall computation. Finally, we extend clockwork scheduling to adaptive video processing by incorporating data-driven clocks that can be tuned on unlabeled video. The accuracy and efficiency of clockwork convnets are evaluated on the Youtube-Objects, NYUD, and Cityscapes video datasets.

Book ChapterDOI
Donggeun Yoo1, Namil Kim1, Sunggyun Park1, Anthony S. Paek, In So Kweon1 
08 Oct 2016
TL;DR: The model transfers an input domain to a target domain in semantic level, and generates the target image in pixel level and employs the real/fake-discriminator as in Generative Adversarial Nets to generate realistic target images.
Abstract: We present an image-conditional image generation model. The model transfers an input domain to a target domain in semantic level, and generates the target image in pixel level. To generate realistic target images, we employ the real/fake-discriminator as in Generative Adversarial Nets [6], but also introduce a novel domain-discriminator to make the generated image relevant to the input image. We verify our model through a challenging task of generating a piece of clothing from an input image of a dressed person. We present a high quality clothing dataset containing the two domains, and succeed in demonstrating decent results.