Showing papers on "Pixel published in 2016"

PDF

Open Access

Proceedings Article•DOI•

Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network

[...]

Wenzhe Shi, Jose Caballero, Ferenc Huszar, Johannes Totz, Andrew Peter Aitken, Rob Bishop, Daniel Rueckert, Zehan Wang - Show less +4 more

27 Jun 2016

TL;DR: This paper presents the first convolutional neural network capable of real-time SR of 1080p videos on a single K2 GPU and introduces an efficient sub-pixel convolution layer which learns an array of upscaling filters to upscale the final LR feature maps into the HR output.

...read moreread less

Abstract: Recently, several models based on deep neural networks have achieved great success in terms of both reconstruction accuracy and computational performance for single image super-resolution. In these methods, the low resolution (LR) input image is upscaled to the high resolution (HR) space using a single filter, commonly bicubic interpolation, before reconstruction. This means that the super-resolution (SR) operation is performed in HR space. We demonstrate that this is sub-optimal and adds computational complexity. In this paper, we present the first convolutional neural network (CNN) capable of real-time SR of 1080p videos on a single K2 GPU. To achieve this, we propose a novel CNN architecture where the feature maps are extracted in the LR space. In addition, we introduce an efficient sub-pixel convolution layer which learns an array of upscaling filters to upscale the final LR feature maps into the HR output. By doing so, we effectively replace the handcrafted bicubic filter in the SR pipeline with more complex upscaling filters specifically trained for each feature map, whilst also reducing the computational complexity of the overall SR operation. We evaluate the proposed approach using images and videos from publicly available datasets and show that it performs significantly better (+0.15dB on Images and +0.39dB on Videos) and is an order of magnitude faster than previous CNN-based methods.

...read moreread less

4,770 citations

Proceedings Article•

Pixel recurrent neural networks

[...]

Aaron van den Oord¹, Nal Kalchbrenner¹, Koray Kavukcuoglu¹•Institutions (1)

Google¹

19 Jun 2016

TL;DR: A deep neural network is presented that sequentially predicts the pixels in an image along the two spatial dimensions and encodes the complete set of dependencies in the image to achieve log-likelihood scores on natural images that are considerably better than the previous state of the art.

...read moreread less

Abstract: Modeling the distribution of natural images is a landmark problem in unsupervised learning. This task requires an image model that is at once expressive, tractable and scalable. We present a deep neural network that sequentially predicts the pixels in an image along the two spatial dimensions. Our method models the discrete probability of the raw pixel values and encodes the complete set of dependencies in the image. Architectural novelties include fast two-dimensional recurrent layers and an effective use of residual connections in deep recurrent networks. We achieve log-likelihood scores on natural images that are considerably better than the previous state of the art. Our main results also provide benchmarks on the diverse ImageNet dataset. Samples generated from the model appear crisp, varied and globally coherent.

...read moreread less

1,801 citations

Proceedings Article•DOI•

Non-local Image Dehazing

[...]

Dana Berman¹, Tali Treibitz², Shai Avidan¹•Institutions (2)

Tel Aviv University¹, University of Haifa²

27 Jun 2016

TL;DR: This work proposes an algorithm, linear in the size of the image, deterministic and requires no training, that performs well on a wide variety of images and is competitive with other state-of-the-art methods on the single image dehazing problem.

...read moreread less

Abstract: Haze limits visibility and reduces image contrast in outdoor images. The degradation is different for every pixel and depends on the distance of the scene point from the camera. This dependency is expressed in the transmission coefficients, that control the scene attenuation and amount of haze in every pixel. Previous methods solve the single image dehazing problem using various patch-based priors. We, on the other hand, propose an algorithm based on a new, non-local prior. The algorithm relies on the assumption that colors of a haze-free image are well approximated by a few hundred distinct colors, that form tight clusters in RGB space. Our key observation is that pixels in a given cluster are often non-local, i.e., they are spread over the entire image plane and are located at different distances from the camera. In the presence of haze these varying distances translate to different transmission coefficients. Therefore, each color cluster in the clear image becomes a line in RGB space, that we term a haze-line. Using these haze-lines, our algorithm recovers both the distance map and the haze-free image. The algorithm is linear in the size of the image, deterministic and requires no training. It performs well on a wide variety of images and is competitive with other stateof-the-art methods.

...read moreread less

1,082 citations

Posted Content•

Pixel Recurrent Neural Networks

[...]

Aaron van den Oord¹, Nal Kalchbrenner¹, Koray Kavukcuoglu¹•Institutions (1)

Google¹

25 Jan 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this paper, a deep neural network that sequentially predicts the pixels in an image along the two spatial dimensions is presented. But the model is not able to model the discrete probability of the raw pixel values and encodes the complete set of dependencies.

...read moreread less

970 citations

Proceedings Article•DOI•

ScribbleSup: Scribble-Supervised Convolutional Networks for Semantic Segmentation

[...]

Di Lin, Jifeng Dai¹, Jiaya Jia, Kaiming He¹, Jian Sun¹ - Show less +1 more•Institutions (1)

Microsoft¹

01 Jun 2016

TL;DR: Zhang et al. as discussed by the authors proposed to use scribbles to annotate images, and developed an algorithm to train convolutional networks for semantic segmentation supervised by scribbles.

...read moreread less

Abstract: Large-scale data is of crucial importance for learning semantic segmentation models, but annotating per-pixel masks is a tedious and inefficient procedure. We note that for the topic of interactive image segmentation, scribbles are very widely used in academic research and commercial software, and are recognized as one of the most userfriendly ways of interacting. In this paper, we propose to use scribbles to annotate images, and develop an algorithm to train convolutional networks for semantic segmentation supervised by scribbles. Our algorithm is based on a graphical model that jointly propagates information from scribbles to unmarked pixels and learns network parameters. We present competitive object semantic segmentation results on the PASCAL VOC dataset by using scribbles as annotations. Scribbles are also favored for annotating stuff (e.g., water, sky, grass) that has no well-defined shape, and our method shows excellent results on the PASCALCONTEXT dataset thanks to extra inexpensive scribble annotations. Our scribble annotations on PASCAL VOC are available at http://research.microsoft.com/en-us/um/ people/jifdai/downloads/scribble_sup.

...read moreread less

748 citations

Proceedings Article•DOI•

Blind Image Deblurring Using Dark Channel Prior

[...]

Jinshan Pan¹, Jinshan Pan², Jinshan Pan³, Deqing Sun², Deqing Sun⁴, Hanspeter Pfister², Ming-Hsuan Yang¹ - Show less +3 more•Institutions (4)

University of California, Merced¹, Harvard University², Dalian University of Technology³, Nvidia⁴

01 Jun 2016

TL;DR: This work introduces a linear approximation of the min operator to compute the dark channel and achieves state-of-the-art results on deblurring natural images and compares favorably methods that are well-engineered for specific scenarios.

...read moreread less

Abstract: We present a simple and effective blind image deblurring method based on the dark channel prior. Our work is inspired by the interesting observation that the dark channel of blurred images is less sparse. While most image patches in the clean image contain some dark pixels, these pixels are not dark when averaged with neighboring highintensity pixels during the blur process. This change in the sparsity of the dark channel is an inherent property of the blur process, which we both prove mathematically and validate using training data. Therefore, enforcing the sparsity of the dark channel helps blind deblurring on various scenarios, including natural, face, text, and low-illumination images. However, sparsity of the dark channel introduces a non-convex non-linear optimization problem. We introduce a linear approximation of the min operator to compute the dark channel. Our look-up-table-based method converges fast in practice and can be directly extended to non-uniform deblurring. Extensive experiments show that our method achieves state-of-the-art results on deblurring natural images and compares favorably methods that are well-engineered for specific scenarios.

...read moreread less

682 citations

Journal Article•DOI•

A review of remote sensing image fusion methods

[...]

Hassan Ghassemian¹•Institutions (1)

Tarbiat Modares University¹

01 Nov 2016-Information Fusion

TL;DR: Some popular and state-of-the-art fusion methods in different levels especially at pixel level are reviewed and varied approaches and metrics for assessment of fused product are presented.

...read moreread less

574 citations

Posted Content•

Direct Sparse Odometry

[...]

Jakob Engel, Vladlen Koltun¹, Daniel Cremers²•Institutions (2)

Intel¹, Technische Universität München²

09 Jul 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: The experiments show that the presented approach significantly outperforms state-of-the-art direct and indirect methods in a variety of real-world settings, both in terms of tracking accuracy and robustness.

...read moreread less

Abstract: We propose a novel direct sparse visual odometry formulation. It combines a fully direct probabilistic model (minimizing a photometric error) with consistent, joint optimization of all model parameters, including geometry -- represented as inverse depth in a reference frame -- and camera motion. This is achieved in real time by omitting the smoothness prior used in other direct methods and instead sampling pixels evenly throughout the images. Since our method does not depend on keypoint detectors or descriptors, it can naturally sample pixels from across all image regions that have intensity gradient, including edges or smooth intensity variations on mostly white walls. The proposed model integrates a full photometric calibration, accounting for exposure time, lens vignetting, and non-linear response functions. We thoroughly evaluate our method on three different datasets comprising several hours of video. The experiments show that the presented approach significantly outperforms state-of-the-art direct and indirect methods in a variety of real-world settings, both in terms of tracking accuracy and robustness.

...read moreread less

557 citations

Proceedings Article•DOI•

Hierarchical Gaussian Descriptor for Person Re-identification

[...]

Tetsu Matsukawa, Takahiro Okabe¹, Einoshin Suzuki, Yoichi Sato²•Institutions (2)

Kyushu Institute of Technology¹, University of Tokyo²

01 Jun 2016

TL;DR: This paper describes a local region in an image via hierarchical Gaussian distribution in which both means and covariances are included in their parameters and shows that the proposed descriptor exhibits remarkably high performance which outperforms the state-of-the-art descriptors for person re-identification.

...read moreread less

Abstract: Describing the color and textural information of a person image is one of the most crucial aspects of person re-identification. In this paper, we present a novel descriptor based on a hierarchical distribution of pixel features. A hierarchical covariance descriptor has been successfully applied for image classification. However, the mean information of pixel features, which is absent in covariance, tends to be major discriminative information of person images. To solve this problem, we describe a local region in an image via hierarchical Gaussian distribution in which both means and covariances are included in their parameters. More specifically, we model the region as a set of multiple Gaussian distributions in which each Gaussian represents the appearance of a local patch. The characteristics of the set of Gaussians are again described by another Gaussian distribution. In both steps, unlike the hierarchical covariance descriptor, the proposed descriptor can model both the mean and the covariance information of pixel features properly. The results of experiments conducted on five databases indicate that the proposed descriptor exhibits re-markably high performance which outperforms the state-of-the-art descriptors for person re-identification.

...read moreread less

554 citations

Proceedings Article•DOI•

Deep Contrast Learning for Salient Object Detection

[...]

Guanbin Li¹, Yizhou Yu¹•Institutions (1)

University of Hong Kong¹

27 Jun 2016

TL;DR: Zhang et al. as mentioned in this paper proposed an end-to-end deep contrast network consisting of two complementary components, a pixel-level fully convolutional stream and a segment-wise spatial pooling stream.

...read moreread less

Abstract: Salient object detection has recently witnessed substantial progress due to powerful features extracted using deep convolutional neural networks (CNNs). However, existing CNN-based methods operate at the patch level instead of the pixel level. Resulting saliency maps are typically blurry, especially near the boundary of salient objects. Furthermore, image patches are treated as independent samples even when they are overlapping, giving rise to significant redundancy in computation and storage. In this paper, we propose an end-to-end deep contrast network to overcome the aforementioned limitations. Our deep network consists of two complementary components, a pixel-level fully convolutional stream and a segment-wise spatial pooling stream. The first stream directly produces a saliency map with pixel-level accuracy from an input image. The second stream extracts segment-wise features very efficiently, and better models saliency discontinuities along object boundaries. Finally, a fully connected CRF model can be optionally incorporated to improve spatial coherence and contour localization in the fused result from these two streams. Experimental results demonstrate that our deep model significantly improves the state of the art.

...read moreread less

553 citations

Journal Article•DOI•

A Cross-Modality Learning Approach for Vessel Segmentation in Retinal Images

[...]

Qiaoliang Li¹, Bowei Feng¹, Linpei Xie¹, Ping Liang¹, Huisheng Zhang¹, Tianfu Wang¹ - Show less +2 more•Institutions (1)

Shenzhen University¹

01 Jan 2016-IEEE Transactions on Medical Imaging

TL;DR: A wide and deep neural network with strong induction ability is proposed to model the transformation, and an efficient training strategy is presented, where instead of a single label of the center pixel, the network can output the label map of all pixels for a given image patch.

...read moreread less

Abstract: This paper presents a new supervised method for vessel segmentation in retinal images. This method remolds the task of segmentation as a problem of cross-modality data transformation from retinal image to vessel map. A wide and deep neural network with strong induction ability is proposed to model the transformation, and an efficient training strategy is presented. Instead of a single label of the center pixel, the network can output the label map of all pixels for a given image patch. Our approach outperforms reported state-of-the-art methods in terms of sensitivity, specificity and accuracy. The result of cross-training evaluation indicates its robustness to the training set. The approach needs no artificially designed feature and no preprocessing step, reducing the impact of subjective factors. The proposed method has the potential for application in image diagnosis of ophthalmologic diseases, and it may provide a new, general, high-performance computing framework for image segmentation.

...read moreread less

Journal Article•DOI•

Shake-The-Box: Lagrangian particle tracking at high particle image densities

[...]

Daniel Schanz¹, Sebastian Gesemann¹, Andreas Schröder¹•Institutions (1)

German Aerospace Center¹

27 Apr 2016-Experiments in Fluids

TL;DR: Shake-The-Box as discussed by the authors is a Lagrangian tracking method that uses a prediction of the particle distribution for the subsequent time-step as a mean to seize the temporal domain.

...read moreread less

Abstract: A Lagrangian tracking method is introduced, which uses a prediction of the particle distribution for the subsequent time-step as a mean to seize the temporal domain. Errors introduced by the prediction process are corrected by an image matching technique (‘shaking’ the particle in space), followed by an iterative triangulation of particles newly entering the measurement domain. The scheme was termed ‘Shake-The-Box’ and previously characterized as ‘4D-PTV’ due to the strong interaction with the temporal dimension. Trajectories of tracer particles are identified at high spatial accuracy due to a nearly complete suppression of ghost particles; a temporal filtering scheme further improves on accuracy and allows for the extraction of local velocity and acceleration as derivatives of a continuous function. Exploiting the temporal information enables the processing of densely seeded flows (beyond 0.1 particles per pixel, ppp), which were previously reserved for tomographic PIV evaluations. While TOMO-PIV uses statistical means to evaluate the flow (building an ‘anonymous’ voxel space with subsequent spatial averaging of the velocity information using correlation), the Shake-The-Box approach is able to identify and track individual particles at numbers of tens or even hundreds of thousands per time-step. The method is outlined in detail, followed by descriptions of applications to synthetic and experimental data. The synthetic data evaluation reveals that STB is able to capture virtually all true particles, while effectively suppressing the formation of ghost particles. For the examined four-camera set-up particle image densities N I up to 0.125 ppp could be processed. For noise-free images, the attained accuracy is very high. The addition of synthetic noise reduces usable particle image density (N I ≤ 0.075 ppp for highly noisy images) and accuracy (still being significantly higher compared to tomographic reconstruction). The solutions remain virtually free of ghost particles. Processing an experimental data set on a transitional jet in water demonstrates the benefits of advanced Lagrangian evaluation in describing flow details—both on small scales (by the individual tracks) and on larger structures (using an interpolation onto an Eulerian grid). Comparisons to standard TOMO-PIV processing for synthetic and experimental evaluations show distinct benefits in local accuracy, completeness of the solution, ghost particle occurrence, spatial resolution, temporal coherence and computational effort.

...read moreread less

Journal Article•DOI•

Burst photography for high dynamic range and low-light imaging on mobile cameras

[...]

Samuel W. Hasinoff¹, Dillon Sharlet¹, Ryan Geiss¹, Andrew Adams¹, Jonathan T. Barron¹, Florian Kainz¹, Jiawen Chen¹, Marc Levoy¹ - Show less +4 more•Institutions (1)

Google¹

11 Nov 2016

TL;DR: A computational photography pipeline that captures, aligns, and merges a burst of frames to reduce noise and increase dynamic range, built atop Android's Camera2 API and written in the Halide domain-specific language (DSL).

...read moreread less

Abstract: Cell phone cameras have small apertures, which limits the number of photons they can gather, leading to noisy images in low light. They also have small sensor pixels, which limits the number of electrons each pixel can store, leading to limited dynamic range. We describe a computational photography pipeline that captures, aligns, and merges a burst of frames to reduce noise and increase dynamic range. Our system has several key features that help make it robust and efficient. First, we do not use bracketed exposures. Instead, we capture frames of constant exposure, which makes alignment more robust, and we set this exposure low enough to avoid blowing out highlights. The resulting merged image has clean shadows and high bit depth, allowing us to apply standard HDR tone mapping methods. Second, we begin from Bayer raw frames rather than the demosaicked RGB (or YUV) frames produced by hardware Image Signal Processors (ISPs) common on mobile platforms. This gives us more bits per pixel and allows us to circumvent the ISP's unwanted tone mapping and spatial denoising. Third, we use a novel FFT-based alignment algorithm and a hybrid 2D/3D Wiener filter to denoise and merge the frames in a burst. Our implementation is built atop Android's Camera2 API, which provides per-frame camera control and access to raw imagery, and is written in the Halide domain-specific language (DSL). It runs in 4 seconds on device (for a 12 Mpix image), requires no user intervention, and ships on several mass-produced cell phones.

...read moreread less

Journal Article•DOI•

Going Deeper with Contextual CNN for Hyperspectral Image Classification

[...]

Hyungtae Lee¹, Heesung Kwon²•Institutions (2)

Booz Allen Hamilton¹, United States Army Research Laboratory²

12 Apr 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this article, a novel deep convolutional neural network (CNN) that is deeper and wider than other existing deep networks for hyperspectral image classification is proposed, which can optimally explore local contextual interactions by jointly exploiting local spatio-spectral relationships of neighboring individual pixel vectors.

...read moreread less

Abstract: In this paper, we describe a novel deep convolutional neural network (CNN) that is deeper and wider than other existing deep networks for hyperspectral image classification. Unlike current state-of-the-art approaches in CNN-based hyperspectral image classification, the proposed network, called contextual deep CNN, can optimally explore local contextual interactions by jointly exploiting local spatio-spectral relationships of neighboring individual pixel vectors. The joint exploitation of the spatio-spectral information is achieved by a multi-scale convolutional filter bank used as an initial component of the proposed CNN pipeline. The initial spatial and spectral feature maps obtained from the multi-scale filter bank are then combined together to form a joint spatio-spectral feature map. The joint feature map representing rich spectral and spatial properties of the hyperspectral image is then fed through a fully convolutional network that eventually predicts the corresponding label of each pixel vector. The proposed approach is tested on three benchmark datasets: the Indian Pines dataset, the Salinas dataset and the University of Pavia dataset. Performance comparison shows enhanced classification performance of the proposed approach over the current state-of-the-art on the three datasets.

...read moreread less

Journal Article•DOI•

Anomaly Detection in Hyperspectral Images Based on Low-Rank and Sparse Representation

[...]

Yang Xu¹, Zebin Wu¹, Jun Li², Antonio Plaza³, Zhihui Wei¹ - Show less +1 more•Institutions (3)

Nanjing University of Science and Technology¹, Sun Yat-sen University², University of Extremadura³

01 Apr 2016-IEEE Transactions on Geoscience and Remote Sensing

TL;DR: A novel method for anomaly detection in hyperspectral images (HSIs) is proposed based on low-rank and sparse representation based on the separation of the background and the anomalies in the observed data.

...read moreread less

Abstract: A novel method for anomaly detection in hyperspectral images (HSIs) is proposed based on low-rank and sparse representation. The proposed method is based on the separation of the background and the anomalies in the observed data. Since each pixel in the background can be approximately represented by a background dictionary and the representation coefficients of all pixels form a low-rank matrix, a low-rank representation is used to model the background part. To better characterize each pixel's local representation, a sparsity-inducing regularization term is added to the representation coefficients. Moreover, a dictionary construction strategy is adopted to make the dictionary more stable and discriminative. Then, the anomalies are determined by the response of the residual matrix. An important advantage of the proposed algorithm is that it combines the global and local structure in the HSI. Experimental results have been conducted using both simulated and real data sets. These experiments indicate that our algorithm achieves very promising anomaly detection performance.

...read moreread less

Journal Article•DOI•

Steganalysis of LSB matching using differences between nonadjacent pixels

[...]

Zhihua Xia¹, Xinhui Wang¹, Xingming Sun¹, Quansheng Liu², Naixue Xiong³ - Show less +1 more•Institutions (3)

Nanjing University of Information Science and Technology¹, University of Southern Brittany², Colorado Technical University³

01 Feb 2016-Multimedia Tools and Applications

TL;DR: This paper models the messages embedded by spatial least significant bit (LSB) matching as independent noises to the cover image, and reveals that the histogram of the differences between pixel gray values is smoothed by the stego bits despite a large distance between the pixels.

...read moreread less

Abstract: This paper models the messages embedded by spatial least significant bit (LSB) matching as independent noises to the cover image, and reveals that the histogram of the differences between pixel gray values is smoothed by the stego bits despite a large distance between the pixels Using the characteristic function of difference histogram (DHCF), we prove that the center of mass of DHCF (DHCF COM) decreases after messages are embedded Accordingly, the DHCF COMs are calculated as distinguishing features from the pixel pairs with different distances The features are calibrated with an image generated by average operation, and then used to train a support vector machine (SVM) classifier The experimental results prove that the features extracted from the differences between nonadjacent pixels can help to tackle LSB matching as well

...read moreread less

Journal Article•DOI•

A Fast and Accurate Unconstrained Face Detector

[...]

Shengcai Liao¹, Anil K. Jain², Stan Z. Li¹•Institutions (2)

Chinese Academy of Sciences¹, Michigan State University²

01 Feb 2016-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A new image feature called Normalized Pixel Difference (NPD) is proposed, computed as the difference to sum ratio between two pixel values, inspired by the Weber Fraction in experimental psychology, which is scale invariant, bounded, and able to reconstruct the original image.

...read moreread less

Abstract: We propose a method to address challenges in unconstrained face detection, such as arbitrary pose variations and occlusions. First, a new image feature called Normalized Pixel Difference (NPD) is proposed. NPD feature is computed as the difference to sum ratio between two pixel values, inspired by the Weber Fraction in experimental psychology. The new feature is scale invariant, bounded, and is able to reconstruct the original image. Second, we propose a deep quadratic tree to learn the optimal subset of NPD features and their combinations, so that complex face manifolds can be partitioned by the learned rules. This way, only a single soft-cascade classifier is needed to handle unconstrained face detection. Furthermore, we show that the NPD features can be efficiently obtained from a look up table, and the detection template can be easily scaled, making the proposed face detector very fast. Experimental results on three public face datasets (FDDB, GENKI, and CMU-MIT) show that the proposed method achieves state-of-the-art performance in detecting unconstrained faces with arbitrary pose variations and occlusions in cluttered scenes.

...read moreread less

Journal Article•DOI•

High Throughput Field Phenotyping of Wheat Plant Height and Growth Rate in Field Plot Trials Using UAV Based Remote Sensing

[...]

Fenner Howard Holman, Andrew B. Riche¹, Adam Michalski¹, March Castle¹, Martin J. Wooster, Malcolm J. Hawkesford¹ - Show less +2 more•Institutions (1)

Rothamsted Research¹

18 Dec 2016-Remote Sensing

TL;DR: This study proves UAV based SfM has the potential to become a new standard for high-throughput phenotyping of in-field crop heights and provides a novel spatial mapping of crop height variation both at the field scale and also within individual plots.

...read moreread less

Abstract: There is a growing need to increase global crop yields, whilst minimising use of resources such as land, fertilisers and water. Agricultural researchers use ground-based observations to identify, select and develop crops with favourable genotypes and phenotypes; however, the ability to collect rapid, high quality and high volume phenotypic data in open fields is restricting this. This study develops and assesses a method for deriving crop height and growth rate rapidly from multi-temporal, very high spatial resolution (1 cm/pixel), 3D digital surface models of crop field trials produced via Structure from Motion (SfM) photogrammetry using aerial imagery collected through repeated campaigns flying an Unmanned Aerial Vehicle (UAV) with a mounted Red Green Blue (RGB) camera. We compare UAV SfM modelled crop heights to those derived from terrestrial laser scanner (TLS) and to the standard field measurement of crop height conducted using a 2 m rule. The most accurate UAV-derived surface model and the TLS both achieve a Root Mean Squared Error (RMSE) of 0.03 m compared to the existing manual 2 m rule method. The optimised UAV method was then applied to the growing season of a winter wheat field phenotyping experiment containing 25 different varieties grown in 27 m2 plots and subject to four different nitrogen fertiliser treatments. Accuracy assessments at different stages of crop growth produced consistently low RMSE values (0.07, 0.02 and 0.03 m for May, June and July, respectively), enabling crop growth rate to be derived from differencing of the multi-temporal surface models. We find growth rates range from −13 mm/day to 17 mm/day. Our results clearly display the impact of variable nitrogen fertiliser rates on crop growth. Digital surface models produced provide a novel spatial mapping of crop height variation both at the field scale and also within individual plots. This study proves UAV based SfM has the potential to become a new standard for high-throughput phenotyping of in-field crop heights.

...read moreread less

Posted Content•

Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network

[...]

Wenzhe Shi, Jose Caballero, Ferenc Huszar, Johannes Totz, Andrew Peter Aitken, Rob Bishop, Daniel Rueckert, Zehan Wang - Show less +4 more

16 Sep 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this paper, the feature maps are extracted in the LR space and an efficient sub-pixel convolution layer is introduced to upscale the final LR feature maps into the HR output, which reduces the computational complexity of the overall SR operation.

...read moreread less

Journal Article•DOI•

Plug-and-Play Priors for Bright Field Electron Tomography and Sparse Interpolation

[...]

Suhas Sreehari¹, Singanallur Venkatakrishnan², Brendt Wohlberg³, Gregery T. Buzzard¹, Lawrence F. Drummy⁴, Jeff Simmons⁴, Charles A. Bouman¹ - Show less +3 more•Institutions (4)

Purdue University¹, Lawrence Berkeley National Laboratory², Los Alamos National Laboratory³, Air Force Research Laboratory⁴

11 Aug 2016-IEEE Transactions on Computational Imaging

TL;DR: This paper presents an algorithm for electron tomographic reconstruction and sparse image interpolation that exploits the nonlocal redundancy in images, and demonstrates that the algorithm produces higher quality reconstructions on both simulated and real electron microscope data, along with improved convergence properties compared to other methods.

...read moreread less

Abstract: Many material and biological samples in scientific imaging are characterized by nonlocal repeating structures. These are studied using scanning electron microscopy and electron tomography. Sparse sampling of individual pixels in a two-dimensional image acquisition geometry, or sparse sampling of projection images with large tilt increments in a tomography experiment, can enable high speed data acquisition and minimize sample damage caused by the electron beam. In this paper, we present an algorithm for electron tomographic reconstruction and sparse image interpolation that exploits the nonlocal redundancy in images. We adapt a framework, termed plug-and-play priors, to solve these imaging problems in a regularized inversion setting. The power of the plug-and-play approach is that it allows a wide array of modern denoising algorithms to be used as a “prior model” for tomography and image interpolation. We also present sufficient mathematical conditions that ensure convergence of the plug-and-play approach, and we use these insights to design a new nonlocal means denoising algorithm. Finally, we demonstrate that the algorithm produces higher quality reconstructions on both simulated and real electron microscope data, along with improved convergence properties compared to other methods.

...read moreread less

Proceedings Article•DOI•

Bilateral Space Video Segmentation

[...]

Nicolas Marki¹, Federico Perazzi², Oliver Wang³, Alexander Sorkine-Hornung²•Institutions (3)

ETH Zurich¹, Disney Research², Adobe Systems³

27 Jun 2016

TL;DR: A new energy on the vertices of a regularly sampled spatiotemporal bilateral grid is designed, which can be solved efficiently using a standard graph cut label assignment, and implicitly approximates long-range, spatio-temporal connections between pixels while still containing only a small number of variables and only local graph edges.

...read moreread less

Abstract: In this work, we propose a novel approach to video segmentation that operates in bilateral space. We design a new energy on the vertices of a regularly sampled spatiotemporal bilateral grid, which can be solved efficiently using a standard graph cut label assignment. Using a bilateral formulation, the energy that we minimize implicitly approximates long-range, spatio-temporal connections between pixels while still containing only a small number of variables and only local graph edges. We compare to a number of recent methods, and show that our approach achieves state-of-the-art results on multiple benchmarks in a fraction of the runtime. Furthermore, our method scales linearly with image size, allowing for interactive feedback on real-world high resolution video.

...read moreread less

Book Chapter•DOI•

Deep Automatic Portrait Matting

[...]

Xiaoyong Shen¹, Xin Tao¹, Hongyun Gao¹, Chao Zhou¹, Jiaya Jia¹ - Show less +1 more•Institutions (1)

The Chinese University of Hong Kong¹

08 Oct 2016

TL;DR: An automatic image matting method for portrait images that does not need user interaction is proposed and achieves comparable results with state-of-the-art methods that require specified foreground and background regions or pixels.

...read moreread less

Abstract: We propose an automatic image matting method for portrait images. This method does not need user interaction, which was however essential in most previous approaches. In order to accomplish this goal, a new end-to-end convolutional neural network (CNN) based framework is proposed taking the input of a portrait image. It outputs the matte result. Our method considers not only image semantic prediction but also pixel-level image matte optimization. A new portrait image dataset is constructed with our labeled matting ground truth. Our automatic method achieves comparable results with state-of-the-art methods that require specified foreground and background regions or pixels. Many applications are enabled given the automatic nature of our system.

...read moreread less

Journal Article•DOI•

Change detection based on deep feature representation and mapping transformation for multi-spatial-resolution remote sensing images

[...]

Puzhao Zhang¹, Maoguo Gong¹, Linzhi Su¹, Jia Liu¹, Li Zhizhou¹ - Show less +1 more•Institutions (1)

Xidian University¹

01 Jun 2016-Isprs Journal of Photogrammetry and Remote Sensing

TL;DR: This paper presents a novel multi-spatial-resolution change detection framework, which incorporates deep-architecture-based unsupervised feature learning and mapping-based feature change analysis, and tries to explore the inner relationships between them by building a mapping neural network.

...read moreread less

Abstract: Multi-spatial-resolution change detection is a newly proposed issue and it is of great significance in remote sensing, environmental and land use monitoring, etc. Though multi-spatial-resolution image-pair are two kinds of representations of the same reality, they are often incommensurable superficially due to their different modalities and properties. In this paper, we present a novel multi-spatial-resolution change detection framework, which incorporates deep-architecture-based unsupervised feature learning and mapping-based feature change analysis. Firstly, we transform multi-resolution image-pair into the same pixel-resolution through co-registration, followed by details recovery, which is designed to remedy the spatial details lost in the registration. Secondly, the denoising autoencoder is stacked to learn local and high-level representation/feature from the local neighborhood of the given pixel, in an unsupervised fashion. Thirdly, motivated by the fact that multi-resolution image-pair share the same reality in the unchanged regions, we try to explore the inner relationships between them by building a mapping neural network. And it can be used to learn a mapping function based on the most-unlikely-changed feature-pairs, which are selected from all the feature-pairs via a coarse initial change map generated in advance. The learned mapping function can bridge the different representations and highlight changes. Finally, we can build a robust and contractive change map through feature similarity analysis, and the change detection result is obtained through the segmentation of the final change map. Experiments are carried out on four real datasets, and the results confirmed the effectiveness and superiority of the proposed method.

...read moreread less

Journal Article•DOI•

Dual color plasmonic pixels create a polarization controlled nano color palette

[...]

Zhibo Li¹, Alasdair W. Clark¹, Jonathan M. Cooper¹•Institutions (1)

University of Glasgow¹

12 Jan 2016-ACS Nano

TL;DR: A plasmonic filter set with polarization-switchable color properties, based upon arrays of asymmetric cross-shaped nanoapertures in an aluminum thin-film, used to create micro image displays containing duality in their optical information states.

...read moreread less

Abstract: Color filters based upon nanostructured metals have garnered significant interest in recent years, having been positioned as alternatives to the organic dye-based filters which provide color selectivity in image sensors, as nonfading “printing” technologies for producing images with nanometer pixel resolution, and as ultra-high-resolution, small foot-print optical storage and encoding solutions. Here, we demonstrate a plasmonic filter set with polarization-switchable color properties, based upon arrays of asymmetric cross-shaped nanoapertures in an aluminum thin-film. Acting as individual color-emitting nanopixels, the plasmonic cavity-apertures have dual-color selectivity, transmitting one of two visible colors, controlled by the polarization of the white light incident on the rear of the pixel and tuned by varying the critical dimensions of the geometry and periodicity of the array. This structural approach to switchable optical filtering enables a single nanoaperture to encode two information states wi...

...read moreread less

Proceedings Article•DOI•

Feature Space Optimization for Semantic Video Segmentation

[...]

Abhijit Kundu¹, Vibhav Vineet², Vladlen Koltun²•Institutions (2)

Georgia Institute of Technology¹, Intel²

01 Jun 2016

TL;DR: An approach to long-range spatio-temporal regularization in semantic video segmentation by optimizing the mapping of pixels to a Euclidean feature space so as to minimize distances between corresponding points.

...read moreread less

Abstract: We present an approach to long-range spatio-temporal regularization in semantic video segmentation. Temporal regularization in video is challenging because both the camera and the scene may be in motion. Thus Euclidean distance in the space-time volume is not a good proxy for correspondence. We optimize the mapping of pixels to a Euclidean feature space so as to minimize distances between corresponding points. Structured prediction is performed by a dense CRF that operates on the optimized features. Experimental results demonstrate that the presented approach increases the accuracy and temporal consistency of semantic video segmentation.

...read moreread less

Journal Article•DOI•

Automatic Change Detection in Synthetic Aperture Radar Images Based on PCANet

[...]

Feng Gao¹, Junyu Dong¹, Bo Li², Qizhi Xu²•Institutions (2)

Ocean University of China¹, Beihang University²

12 Oct 2016-IEEE Geoscience and Remote Sensing Letters

TL;DR: This letter presents a novel change detection method for multitemporal synthetic aperture radar images based on PCANet that exploits representative neighborhood features from each pixel using PCA filters as convolutional filters to generate change maps with less noise spots.

...read moreread less

Abstract: This letter presents a novel change detection method for multitemporal synthetic aperture radar images based on PCANet. This method exploits representative neighborhood features from each pixel using PCA filters as convolutional filters. Thus, the proposed method is more robust to the speckle noise and can generate change maps with less noise spots. Given two multitemporal images, Gabor wavelets and fuzzy $c$ -means are utilized to select interested pixels that have high probability of being changed or unchanged. Then, new image patches centered at interested pixels are generated and a PCANet model is trained using these patches. Finally, pixels in the multitemporal images are classified by the trained PCANet model. The PCANet classification result and the preclassification result are combined to form the final change map. The experimental results obtained on three real SAR image data sets confirm the effectiveness of the proposed method.

...read moreread less

Journal Article•DOI•

Detection crack in image using Otsu method and multiple filtering in image processing techniques

[...]

Ahmed Mahgoub Ahmed Talab¹, Zhangcan Huang¹, Fan Xi¹, Liu HaiMing¹•Institutions (1)

Wuhan University of Technology¹

01 Feb 2016-Optik

TL;DR: A method for detection crack patterns in cement use image processing techniques and the advantage of this method is clearly and accurate detection of cracks in images.

...read moreread less

Journal Article•DOI•

A novel lossless color image encryption scheme using 2D DWT and 6D hyperchaotic system

[...]

Xiangjun Wu¹, Dawei Wang², Jürgen Kurths¹, Haibin Kan³•Institutions (3)

Humboldt State University¹, Henan University², Fudan University³

01 Jul 2016-Information Sciences

TL;DR: Experimental results and security analysis demonstrate that the proposed algorithm has a high security, fast speed and can resist various attacks.

...read moreread less

Book Chapter•DOI•

Clockwork Convnets for Video Semantic Segmentation

[...]

Evan Shelhamer¹, Kate Rakelly¹, Judy Hoffman¹, Trevor Darrell¹•Institutions (1)

University of California, Berkeley¹

08 Oct 2016

TL;DR: In this article, a novel family of "clockwork" convnets driven by fixed or adaptive clock signals is proposed to schedule the processing of different layers at different update rates according to their semantic stability.

...read moreread less

Abstract: Recent years have seen tremendous progress in still-image segmentation; however the naive application of these state-of-the-art algorithms to every video frame requires considerable computation and ignores the temporal continuity inherent in video. We propose a video recognition framework that relies on two key observations: (1) while pixels may change rapidly from frame to frame, the semantic content of a scene evolves more slowly, and (2) execution can be viewed as an aspect of architecture, yielding purpose-fit computation schedules for networks. We define a novel family of “clockwork” convnets driven by fixed or adaptive clock signals that schedule the processing of different layers at different update rates according to their semantic stability. We design a pipeline schedule to reduce latency for real-time recognition and a fixed-rate schedule to reduce overall computation. Finally, we extend clockwork scheduling to adaptive video processing by incorporating data-driven clocks that can be tuned on unlabeled video. The accuracy and efficiency of clockwork convnets are evaluated on the Youtube-Objects, NYUD, and Cityscapes video datasets.

...read moreread less

Book Chapter•DOI•

Pixel-Level Domain Transfer

[...]

Donggeun Yoo¹, Namil Kim¹, Sunggyun Park¹, Anthony S. Paek, In So Kweon¹ - Show less +1 more•Institutions (1)

KAIST¹

08 Oct 2016

TL;DR: The model transfers an input domain to a target domain in semantic level, and generates the target image in pixel level and employs the real/fake-discriminator as in Generative Adversarial Nets to generate realistic target images.

...read moreread less

Abstract: We present an image-conditional image generation model. The model transfers an input domain to a target domain in semantic level, and generates the target image in pixel level. To generate realistic target images, we employ the real/fake-discriminator as in Generative Adversarial Nets [6], but also introduce a novel domain-discriminator to make the generated image relevant to the input image. We verify our model through a challenging task of generating a piece of clothing from an input image of a dressed person. We present a high quality clothing dataset containing the two domains, and succeed in demonstrating decent results.

...read moreread less

Collapse