scispace - formally typeset
Search or ask a question

Showing papers on "Pixel published in 2015"


Proceedings ArticleDOI
07 Jun 2015
TL;DR: In this paper, the authors define the hypercolumn at a pixel as the vector of activations of all CNN units above that pixel, and use hypercolumns as pixel descriptors.
Abstract: Recognition algorithms based on convolutional networks (CNNs) typically use the output of the last layer as a feature representation. However, the information in this layer may be too coarse spatially to allow precise localization. On the contrary, earlier layers may be precise in localization but will not capture semantics. To get the best of both worlds, we define the hypercolumn at a pixel as the vector of activations of all CNN units above that pixel. Using hypercolumns as pixel descriptors, we show results on three fine-grained localization tasks: simultaneous detection and segmentation [22], where we improve state-of-the-art from 49.7 mean APr [22] to 60.0, keypoint localization, where we get a 3.3 point boost over [20], and part labeling, where we show a 6.6 point gain over a strong baseline.

1,511 citations


Journal ArticleDOI
B. Flaugher, H. T. Diehl, K. Honscheid, T. M. C. Abbott, O. Alvarez, R. Angstadt, J. Annis, M. Antonik, O. Ballester, L. Beaufore, Gary Bernstein, Rebecca A. Bernstein, B. Bigelow, Marco Bonati, D. Boprie, David J. Brooks, E. Buckley-Geer, J. Campa, Laia Cardiel-Sas, Francisco J. Castander, Javier Castilla, H. Cease, J. M. Cela-Ruiz, Steve Chappa, Edward C. Chi, C. Cooper, L. N. da Costa, E. Dede, G. Derylo, Darren L. DePoy, J. De Vicente, P. Doel, Alex Drlica-Wagner, J. Eiting, Ann Elliott, J. Emes, Juan Estrada, A. Fausti Neto, D. A. Finley, R. Flores, Josh Frieman, D. W. Gerdes, Michael D. Gladders, B. Gregory, G. Gutierrez, Jiangang Hao, S.E. Holland, Scott Holm, D. Huffman, Cheryl Jackson, David J. James, M. Jonas, Armin Karcher, I. Karliner, Steve Kent, Richard Kessler, Mark Kozlovsky, Richard G. Kron, Donna Kubik, K. Kuehn, S. E. Kuhlmann, K. Kuk, O. Lahav, A. Lathrop, J. Lee, Michael Levi, Peter Lewis, Tianjun Li, I. Mandrichenko, Jennifer L. Marshall, G. Martinez, K. W. Merritt, Ramon Miquel, F. Munoz, Eric H. Neilsen, Robert C. Nichol, Brian Nord, Ricardo L. C. Ogando, Jamieson Olsen, N. Palio, K. Patton, John Peoples, A. A. Plazas, J. Rauch, Kevin Reil, J.-P. Rheault, Natalie A. Roe, H. Rogers, A. Roodman, E. J. Sanchez, V. Scarpine, R. H. Schindler, Ricardo Schmidt, R. Schmitt, Michael Schubnell, Katherine Schultz, P. Schurter, L. Scott, S. Serrano, Terri Shaw, Robert Connon Smith, Marcelle Soares-Santos, A. Stefanik, W. Stuermer, E. Suchyta, A. Sypniewski, G. Tarle, Jon J Thaler, R. Tighe, C. Tran, Douglas L. Tucker, Alistair R. Walker, G. Wang, M. G. Watson, Curtis Weaverdyck, W. C. Wester, Robert J. Woods, B. Yanny 
TL;DR: The Dark Energy Camera as discussed by the authors was designed and constructed by the Dark Energy Survey Collaboration, and meets or exceeds the stringent requirements designed for the wide-field and supernova surveys for which the collaboration uses it.
Abstract: The Dark Energy Camera is a new imager with a 2.2-degree diameter field of view mounted at the prime focus of the Victor M. Blanco 4-meter telescope on Cerro Tololo near La Serena, Chile. The camera was designed and constructed by the Dark Energy Survey Collaboration, and meets or exceeds the stringent requirements designed for the wide-field and supernova surveys for which the collaboration uses it. The camera consists of a five element optical corrector, seven filters, a shutter with a 60 cm aperture, and a CCD focal plane of 250 micron thick fully-depleted CCDs cooled inside a vacuum Dewar. The 570 Mpixel focal plane comprises 62 2kx4k CCDs for imaging and 12 2kx2k CCDs for guiding and focus. The CCDs have 15 microns x15 microns pixels with a plate scale of 0.263 arc sec per pixel. A hexapod system provides state-of-the-art focus and alignment capability. The camera is read out in 20 seconds with 6-9 electrons readout noise. This paper provides a technical description of the camera's engineering, construction, installation, and current status.

715 citations


Journal ArticleDOI
TL;DR: In this article, a new underwater color image quality evaluation (UCIQE) metric is proposed to quantify the non-uniform color cast, blurring, and low contrast that characterize underwater engineering and monitoring images.
Abstract: Quality evaluation of underwater images is a key goal of underwater video image retrieval and intelligent processing. To date, no metric has been proposed for underwater color image quality evaluation (UCIQE). The special absorption and scattering characteristics of the water medium do not allow direct application of natural color image quality metrics especially to different underwater environments. In this paper, subjective testing for underwater image quality has been organized. The statistical distribution of the underwater image pixels in the CIELab color space related to subjective evaluation indicates the sharpness and colorful factors correlate well with subjective image quality perception. Based on these, a new UCIQE metric, which is a linear combination of chroma, saturation, and contrast, is proposed to quantify the non-uniform color cast, blurring, and low-contrast that characterize underwater engineering and monitoring images. Experiments are conducted to illustrate the performance of the proposed UCIQE metric and its capability to measure the underwater image enhancement results. They show that the proposed metric has comparable performance to the leading natural color image quality metrics and the underwater grayscale image quality metrics available in the literature, and can predict with higher accuracy the relative amount of degradation with similar image content in underwater environments. Importantly, UCIQE is a simple and fast solution for real-time underwater video processing. The effectiveness of the presented measure is also demonstrated by subjective evaluation. The results show better correlation between the UCIQE and the subjective mean opinion score.

638 citations


Proceedings Article
27 May 2015
TL;DR: SegNet as mentioned in this paper is composed of a stack of encoders followed by a corresponding decoder stack which feeds into a soft-max classification layer to map low resolution feature maps at the output of the encoder stack to full input image size feature maps.
Abstract: We propose a novel deep architecture, SegNet, for semantic pixel wise image labelling. SegNet has several attractive properties; (i) it only requires forward evaluation of a fully learnt function to obtain smooth label predictions, (ii) with increasing depth, a larger context is considered for pixel labelling which improves accuracy, and (iii) it is easy to visualise the effect of feature activation(s) in the pixel label space at any depth. SegNet is composed of a stack of encoders followed by a corresponding decoder stack which feeds into a soft-max classification layer. The decoders help map low resolution feature maps at the output of the encoder stack to full input image size feature maps. This addresses an important drawback of recent deep learning approaches which have adopted networks designed for object categorization for pixel wise labelling. These methods lack a mechanism to map deep layer feature maps to input dimensions. They resort to ad hoc methods to upsample features, e.g. by replication. This results in noisy predictions and also restricts the number of pooling layers in order to avoid too much upsampling and thus reduces spatial context. SegNet overcomes these problems by learning to map encoder outputs to image pixel labels. We test the performance of SegNet on outdoor RGB scenes from CamVid, KITTI and indoor scenes from the NYU dataset. Our results show that SegNet achieves state-of-the-art performance even without use of additional cues such as depth, video frames or post-processing with CRF models.

580 citations


Proceedings ArticleDOI
01 Sep 2015
TL;DR: A novel Large-Scale Direct SLAM algorithm for stereo cameras (Stereo LSD-SLAM) that runs in real-time at high frame rate on standard CPUs, capable of handling aggressive brightness changes between frames - greatly improving the performance in realistic settings.
Abstract: We propose a novel Large-Scale Direct SLAM algorithm for stereo cameras (Stereo LSD-SLAM) that runs in real-time at high frame rate on standard CPUs. In contrast to sparse interest-point based methods, our approach aligns images directly based on the photoconsistency of all high-contrast pixels, including corners, edges and high texture areas. It concurrently estimates the depth at these pixels from two types of stereo cues: Static stereo through the fixed-baseline stereo camera setup as well as temporal multi-view stereo exploiting the camera motion. By incorporating both disparity sources, our algorithm can even estimate depth of pixels that are under-constrained when only using fixed-baseline stereo. Using a fixed baseline, on the other hand, avoids scale-drift that typically occurs in pure monocular SLAM.We furthermore propose a robust approach to enforce illumination invariance, capable of handling aggressive brightness changes between frames - greatly improving the performance in realistic settings. In experiments, we demonstrate state-of-the-art results on stereo SLAM benchmarks such as Kitti or challenging datasets from the EuRoC Challenge 3 for micro aerial vehicles.

521 citations


Journal ArticleDOI
TL;DR: Experimental results indicate that the proposed detector may outperform the traditional detection methods such as the classic Reed-Xiaoli (RX) algorithm, the kernel RX algorithm, and the state-of-the-art robust principal component analysis based and sparse-representation-based anomaly detectors, with low computational cost.
Abstract: In this paper, collaborative representation is proposed for anomaly detection in hyperspectral imagery. The algorithm is directly based on the concept that each pixel in background can be approximately represented by its spatial neighborhoods, while anomalies cannot. The representation is assumed to be the linear combination of neighboring pixels, and the collaboration of representation is reinforced by l 2 -norm minimization of the representation weight vector. To adjust the contribution of each neighboring pixel, a distance-weighted regularization matrix is included in the optimization problem, which has a simple and closed-form solution. By imposing the sum-to-one constraint to the weight vector, the stability of the solution can be enhanced. The major advantage of the proposed algorithm is the capability of adaptively modeling the background even when anomalous pixels are involved. A kernel extension of the proposed approach is also studied. Experimental results indicate that our proposed detector may outperform the traditional detection methods such as the classic Reed-Xiaoli (RX) algorithm, the kernel RX algorithm, and the state-of-the-art robust principal component analysis based and sparse-representation-based anomaly detectors, with low computational cost.

480 citations


Journal ArticleDOI
TL;DR: Experimental results show that the resultant algorithms produce images with better visual quality and at the same time halo artifacts can be reduced/avoided from appearing in the final images with negligible increment on running times.
Abstract: It is known that local filtering-based edge preserving smoothing techniques suffer from halo artifacts. In this paper, a weighted guided image filter (WGIF) is introduced by incorporating an edge-aware weighting into an existing guided image filter (GIF) to address the problem. The WGIF inherits advantages of both global and local smoothing filters in the sense that: 1) the complexity of the WGIF is O(N) for an image with N pixels, which is same as the GIF and 2) the WGIF can avoid halo artifacts like the existing global smoothing filters. The WGIF is applied for single image detail enhancement, single image haze removal, and fusion of differently exposed images. Experimental results show that the resultant algorithms produce images with better visual quality and at the same time halo artifacts can be reduced/avoided from appearing in the final images with negligible increment on running times.

440 citations


Journal ArticleDOI
TL;DR: The proposed method to fuse source images by weighted average using the weights computed from the detail images that are extracted from the source images using CBF has shown good performance and the visual quality of the fused image by the proposed method is superior to other methods.
Abstract: Like bilateral filter (BF), cross bilateral filter (CBF) considers both gray-level similarities and geometric closeness of the neighboring pixels without smoothing edges, but it uses one image for finding the kernel and other to filter, and vice versa. In this paper, it is proposed to fuse source images by weighted average using the weights computed from the detail images that are extracted from the source images using CBF. The performance of the proposed method has been verified on several pairs of multisensor and multifocus images and compared with the existing methods visually and quantitatively. It is found that, none of the methods have shown consistence performance for all the performance metrics. But as compared to them, the proposed method has shown good performance in most of the cases. Further, the visual quality of the fused image by the proposed method is superior to other methods.

417 citations


Proceedings ArticleDOI
07 Dec 2015
TL;DR: This work builds on the Patchmatch idea: starting from randomly generated 3D planes in scene space, the best-fitting planes are iteratively propagated and refined to obtain a 3D depth and normal field per view, such that a robust photo-consistency measure over all images is maximized.
Abstract: We present a new, massively parallel method for high-quality multiview matching. Our work builds on the Patchmatch idea: starting from randomly generated 3D planes in scene space, the best-fitting planes are iteratively propagated and refined to obtain a 3D depth and normal field per view, such that a robust photo-consistency measure over all images is maximized. Our main novelties are on the one hand to formulate Patchmatch in scene space, which makes it possible to aggregate image similarity across multiple views and obtain more accurate depth maps. And on the other hand a modified, diffusion-like propagation scheme that can be massively parallelized and delivers dense multiview correspondence over ten 1.9-Megapixel images in 3 seconds, on a consumer-grade GPU. Our method uses a slanted support window and thus has no fronto-parallel bias, it is completely local and parallel, such that computation time scales linearly with image size, and inversely proportional to the number of parallel threads. Furthermore, it has low memory footprint (four values per pixel, independent of the depth range). It therefore scales exceptionally well and can handle multiple large images at high depth resolution. Experiments on the DTU and Middlebury multiview datasets as well as oblique aerial images show that our method achieves very competitive results with high accuracy and completeness, across a range of different scenarios.

410 citations


Patent
30 Sep 2015
TL;DR: In this paper, an information processing method and electronic equipment are used for settling a technical problem of improving an image with an overlow contrast in prior art and realizes a technical effect of improving image quality through effective image contrast increase.
Abstract: The invention provides an information processing method and electronic equipment. The information processing method and the electronic equipment are used for settling a technical problem of improving an image with an overlow contrast in prior art and realizes a technical effect of improving an image quality through effective image contrast increase. The information processing method comprises the steps of acquiring a first image of which the first contrast is lower than a threshold; acquiring the first color value of at least one pixel point of the first image, wherein the first color value is composed of the value of each color channel of a first color space; calculating the color deviation of the first image and a reinforcing coefficient to be processed on the first image; acquiring a new first color value of at least one pixel point based on the first color value of at least one pixel point, the color deviation and the reinforcing coefficient; and acquiring a first optimized image which corresponds with the first image based on the new first color value of at least one pixel point.

374 citations


Proceedings ArticleDOI
07 Jun 2015
TL;DR: The Materials in Context Database (MINC) as mentioned in this paper is a large-scale, open dataset of materials in the wild, and combine this dataset with deep learning to achieve material recognition and segmentation of images from the wild.
Abstract: Recognizing materials in real-world images is a challenging task. Real-world materials have rich surface texture, geometry, lighting conditions, and clutter, which combine to make the problem particularly difficult. In this paper, we introduce a new, large-scale, open dataset of materials in the wild, the Materials in Context Database (MINC), and combine this dataset with deep learning to achieve material recognition and segmentation of images in the wild. MINC is an order of magnitude larger than previous material databases, while being more diverse and well-sampled across its 23 categories. Using MINC, we train convolutional neural networks (CNNs) for two tasks: classifying materials from patches, and simultaneous material recognition and segmentation in full images. For patch-based classification on MINC we found that the best performing CNN architectures can achieve 85.2% mean class accuracy. We convert these trained CNN classifiers into an efficient fully convolutional framework combined with a fully connected conditional random field (CRF) to predict the material at every pixel in an image, achieving 73.1% mean class accuracy. Our experiments demonstrate that having a large, well-sampled dataset such as MINC is crucial for real-world material recognition and segmentation.

Journal ArticleDOI
TL;DR: This work demonstrates a single-photon imaging system based on a time-gated intensified camera from which the image of an object can be inferred from very few detected photons, and shows that a ghost-imaging configuration is a useful approach for obtaining images with high signal-to-noise ratios.
Abstract: Low-light-level imaging techniques have application in many diverse fields, ranging from biological sciences to security. A high-quality digital camera based on a multi-megapixel array will typically record an image by collecting of order 105 photons per pixel, but by how much could this photon flux be reduced? In this work we demonstrate a single-photon imaging system based on a time-gated intensified camera from which the image of an object can be inferred from very few detected photons. We show that a ghost-imaging configuration, where the image is obtained from photons that have never interacted with the object, is a useful approach for obtaining images with high signal-to-noise ratios. The use of heralded single photons ensures that the background counts can be virtually eliminated from the recorded images. By applying principles of image compression and associated image reconstruction, we obtain high-quality images of objects from raw data formed from an average of fewer than one detected photon per image pixel.

Journal ArticleDOI
TL;DR: A novel image fusion method for multi-focus images with dense scale invariant feature transform (SIFT) that shows the great potential of image local features such as the dense SIFT used for image fusion.

Proceedings ArticleDOI
07 Dec 2015
TL;DR: By working directly on the whole image, the proposed CSC-SR algorithm does not need to divide the image into overlapped patches, and can exploit the image global correlation to produce more robust reconstruction of image local structures.
Abstract: Most of the previous sparse coding (SC) based super resolution (SR) methods partition the image into overlapped patches, and process each patch separately. These methods, however, ignore the consistency of pixels in overlapped patches, which is a strong constraint for image reconstruction. In this paper, we propose a convolutional sparse coding (CSC) based SR (CSC-SR) method to address the consistency issue. Our CSC-SR involves three groups of parameters to be learned: (i) a set of filters to decompose the low resolution (LR) image into LR sparse feature maps, (ii) a mapping function to predict the high resolution (HR) feature maps from the LR ones, and (iii) a set of filters to reconstruct the HR images from the predicted HR feature maps via simple convolution operations. By working directly on the whole image, the proposed CSC-SR algorithm does not need to divide the image into overlapped patches, and can exploit the image global correlation to produce more robust reconstruction of image local structures. Experimental results clearly validate the advantages of CSC over patch based SC in SR application. Compared with state-of-the-art SR methods, the proposed CSC-SR method achieves highly competitive PSNR results, while demonstrating better edge and texture preservation performance.

Journal ArticleDOI
TL;DR: Experimental results on three widely used real HSIs indicate that the proposed SC-MK approach outperforms several well-known classification methods.
Abstract: For the classification of hyperspectral images (HSIs), this paper presents a novel framework to effectively utilize the spectral–spatial information of superpixels via multiple kernels, which is termed as superpixel-based classification via multiple kernels (SC-MK). In the HSI, each superpixel can be regarded as a shape-adaptive region, which consists of a number of spatial neighboring pixels with very similar spectral characteristics. First, the proposed SC-MK method adopts an oversegmentation algorithm to cluster the HSI into many superpixels. Then, three kernels are separately employed for the utilization of the spectral information, as well as spatial information, within and among superpixels. Finally, the three kernels are combined together and incorporated into a support vector machine classifier. Experimental results on three widely used real HSIs indicate that the proposed SC-MK approach outperforms several well-known classification methods.

Journal ArticleDOI
TL;DR: This paper presents a novel multi-focus image fusion method in spatial domain that utilizes a dictionary which is learned from local patches of source images and outperforms existing state-of-the-art methods, in terms of visual and quantitative evaluations.

Journal ArticleDOI
TL;DR: An image processing toolbox that generates images that are linear with respect to radiance from the RAW files of numerous camera brands and can combine image channels from multispectral cameras, including additional ultraviolet photographs, which enables objective measures of reflectance and colour using a wide range of consumer cameras.
Abstract: Quantitative measurements of colour, pattern and morphology are vital to a growing range of disciplines. Digital cameras are readily available and already widely used for making these measurements, having numerous advantages over other techniques, such as spectrometry. However, off-the-shelf consumer cameras are designed to produce images for human viewing, meaning that their uncalibrated photographs cannot be used for making reliable, quantitative measurements. Many studies still fail to appreciate this, and of those scientists who are aware of such issues, many are hindered by a lack of usable tools for making objective measurements from photographs.We have developed an image processing toolbox that generates images that are linear with respect to radiance from the RAW files of numerous camera brands and can combine image channels from multispectral cameras, including additional ultraviolet photographs. Images are then normalised using one or more grey standards to control for lighting conditions. This enables objective measures of reflectance and colour using a wide range of consumer cameras. Furthermore, if the camera's spectral sensitivities are known, the software can convert images to correspond to the visual system (cone-catch values) of a wide range of animals, enabling human and non-human visual systems to be modelled. The toolbox also provides image analysis tools that can extract luminance (lightness), colour and pattern information. Furthermore, all processing is performed on 32-bit floating point images rather than commonly used 8-bit images. This increases precision and reduces the likelihood of data loss through rounding error or saturation of pixels, while also facilitating the measurement of objects with shiny or fluorescent properties.All cameras tested using this software were found to demonstrate a linear response within each image and across a range of exposure times. Cone-catch mapping functions were highly robust, converting images to several animal visual systems and yielding data that agreed closely with spectrometer-based estimates.Our imaging toolbox is freely available as an addition to the open source ImageJ software. We believe that it will considerably enhance the appropriate use of digital cameras across multiple areas of biology, in particular researchers aiming to quantify animal and plant visual signals.

Patent
29 May 2015
TL;DR: In this article, the authors described an imaging apparatus having an imaging assembly that includes an image sensor, which can capture a frame or image data having image data corresponding to a second set of pixels of the image sensor.
Abstract: There is described an imaging apparatus having an imaging assembly that includes an image sensor. The imaging apparatus can capture a frame of image data having image data corresponding to a first set of pixels of the image sensor. The imaging apparatus can capture a frame or image data having image data corresponding to a second set of pixels of the image sensor.

Patent
01 Jun 2015
TL;DR: In this article, the authors proposed a method for differential image quality enhancement for a detection system including multiple electromagnetic radiation detectors which include obtaining an image from a chemical band electromagnetic radiation detector and an image of a reference band EM radiation detector.
Abstract: Methods for differential image quality enhancement for a detection system including multiple electromagnetic radiation detectors which include obtaining an image from a chemical band electromagnetic radiation detector and an image from a reference band electromagnetic radiation detector. Each of the images includes a plurality of pixels, each pixel having an associated intensity value. One or more intensity values of a plurality of pixels from the reference band image are adjusted based on one or more intensity value parameters of the chemical band image.

Journal ArticleDOI
TL;DR: Assessing to what extent state-of-the-art supervised classification methods can be applied to high resolution multi-temporal optical imagery to produce accurate crop type maps at the global scale shows that a random forest classifier operating on linearly temporally gap-filled images can achieve overall accuracies above 80% for most sites.
Abstract: Crop area extent estimates and crop type maps provide crucial information for agricultural monitoring and management. Remote sensing imagery in general and, more specifically, high temporal and high spatial resolution data as the ones which will be available with upcoming systems, such as Sentinel-2, constitute a major asset for this kind of application. The goal of this paper is to assess to what extent state-of-the-art supervised classification methods can be applied to high resolution multi-temporal optical imagery to produce accurate crop type maps at the global scale. Five concurrent strategies for automatic crop type map production have been selected and benchmarked using SPOT4 (Take5) and Landsat 8 data over 12 test sites spread all over the globe (four in Europe, four in Africa, two in America and two in Asia). This variety of tests sites allows one to draw conclusions applicable to a wide variety of landscapes and crop systems. The results show that a random forest classifier operating on linearly temporally gap-filled images can achieve overall accuracies above 80% for most sites. Only two sites showed low performances: Madagascar due to the presence of fields smaller than the pixel size and Burkina Faso due to a mix of trees and crops in the fields. The approach is based on supervised machine learning techniques, which need in situ data collection for the training step, but the map production is fully automatic.

Journal ArticleDOI
TL;DR: A tunable polarization-independent reflective surface where the colour of the surface is changed as a function of applied voltage is demonstrated, paving the way towards dynamic pixels for reflective displays.
Abstract: Structural colour arising from nanostructured metallic surfaces offers many benefits compared to conventional pigmentation based display technologies, such as increased resolution and scalability of their optical response with structure dimensions. However, once these structures are fabricated their optical characteristics remain static, limiting their potential application. Here, by using a specially designed nanostructured plasmonic surface in conjunction with high birefringence liquid crystals, we demonstrate a tunable polarizationindependent reflective surface where the colour of the surface is changed as a function of applied voltage. A large range of colour tunability is achieved over previous reports by utilizing an engineered surface which allows full liquid crystal reorientation while maximizing the overlap between plasmonic fields and liquid crystal. In combination with imprinted structures of varying periods, a full range of colours spanning the entire visible spectrum is achieved, paving the way towards dynamic pixels for reflective displays.

Proceedings ArticleDOI
07 Jun 2015
TL;DR: This work first detecting and segmenting object instances in the scene and then using a convolutional neural network to predict the pose of the object, which is trained using pixel surface normals in images containing renderings of synthetic objects.
Abstract: The goal of this work is to represent objects in an RGB-D scene with corresponding 3D models from a library. We approach this problem by first detecting and segmenting object instances in the scene and then using a convolutional neural network (CNN) to predict the pose of the object. This CNN is trained using pixel surface normals in images containing renderings of synthetic objects. When tested on real data, our method outperforms alternative algorithms trained on real data. We then use this coarse pose estimate along with the inferred pixel support to align a small number of prototypical models to the data, and place into the scene the model that fits best. We observe a 48% relative improvement in performance at the task of 3D detection over the current state-of-the-art [34], while being an order of magnitude faster.

Journal ArticleDOI
TL;DR: A novel stopping criterion is presented that terminates the iterative process leading to higher vessel segmentation accuracy and is robust to the rate of new vessel pixel addition.
Abstract: This paper presents a novel unsupervised iterative blood vessel segmentation algorithm using fundus images. First, a vessel enhanced image is generated by tophat reconstruction of the negative green plane image. An initial estimate of the segmented vasculature is extracted by global thresholding the vessel enhanced image. Next, new vessel pixels are identified iteratively by adaptive thresholding of the residual image generated by masking out the existing segmented vessel estimate from the vessel enhanced image. The new vessel pixels are, then, region grown into the existing vessel, thereby resulting in an iterative enhancement of the segmented vessel structure. As the iterations progress, the number of false edge pixels identified as new vessel pixels increases compared to the number of actual vessel pixels. A key contribution of this paper is a novel stopping criterion that terminates the iterative process leading to higher vessel segmentation accuracy. This iterative algorithm is robust to the rate of new vessel pixel addition since it achieves 93.2–95.35% vessel segmentation accuracy with 0.9577–0.9638 area under ROC curve (AUC) on abnormal retinal images from the STARE dataset. The proposed algorithm is computationally efficient and consistent in vessel segmentation performance for retinal images with variations due to pathology, uneven illumination, pigmentation, and fields of view since it achieves a vessel segmentation accuracy of about 95% in an average time of 2.45, 3.95, and 8 s on images from three public datasets DRIVE, STARE, and CHASE_DB1, respectively. Additionally, the proposed algorithm has more than 90% segmentation accuracy for segmenting peripapillary blood vessels in the images from the DRIVE and CHASE_DB1 datasets.

Journal ArticleDOI
TL;DR: A novel method of evaluating the complexity of image blocks, which considers multiple neighboring pixels according to the locations of different pixels, which can increase the correctness of data extraction/image recovery and reduce average extracted-bit error rate when the block size is appropriate.

Journal ArticleDOI
TL;DR: It is demonstrated that Random Forest regression-voting can be used to generate high quality response images quickly and leads to fast and accurate shape model matching when applied in the Constrained Local Model framework.
Abstract: A widely used approach for locating points on deformable objects in images is to generate feature response images for each point, and then to fit a shape model to these response images. We demonstrate that Random Forest regression-voting can be used to generate high quality response images quickly. Rather than using a generative or a discriminative model to evaluate each pixel, a regressor is used to cast votes for the optimal position of each point. We show that this leads to fast and accurate shape model matching when applied in the Constrained Local Model framework. We evaluate the technique in detail, and compare it with a range of commonly used alternatives across application areas: the annotation of the joints of the hands in radiographs and the detection of feature points in facial images. We show that our approach outperforms alternative techniques, achieving what we believe to be the most accurate results yet published for hand joint annotation and state-of-the-art performance for facial feature point detection.

Posted Content
TL;DR: PyraMiD-LSTM as mentioned in this paper re-arranges the traditional cuboid order of computations in MD-RNNs in pyramidal fashion, which is easy to parallelize, especially for stacks of brain slice images.
Abstract: Convolutional Neural Networks (CNNs) can be shifted across 2D images or 3D videos to segment them. They have a fixed input size and typically perceive only small local contexts of the pixels to be classified as foreground or background. In contrast, Multi-Dimensional Recurrent NNs (MD-RNNs) can perceive the entire spatio-temporal context of each pixel in a few sweeps through all pixels, especially when the RNN is a Long Short-Term Memory (LSTM). Despite these theoretical advantages, however, unlike CNNs, previous MD-LSTM variants were hard to parallelize on GPUs. Here we re-arrange the traditional cuboid order of computations in MD-LSTM in pyramidal fashion. The resulting PyraMiD-LSTM is easy to parallelize, especially for 3D data such as stacks of brain slice images. PyraMiD-LSTM achieved best known pixel-wise brain image segmentation results on MRBrainS13 (and competitive results on EM-ISBI12).

Journal ArticleDOI
TL;DR: A novel algorithm that combines sparse and collaborative representation is proposed for target detection in hyperspectral imagery that outperforms the existing target detection algorithms, such as adaptive coherence estimator and pure sparse representation-based detector.

Book ChapterDOI
Yuanpu Xie1, Fuyong Xing1, Xiangfei Kong1, Hai Su1, Lin Yang1 
05 Oct 2015
TL;DR: A novel convolutional neural network (CNN) based structured regression model is presented, which is shown to be able to handle touching cells, inhomogeneous background noises, and large variations in sizes and shapes.
Abstract: Robust cell detection serves as a critical prerequisite for many biomedical image analysis applications. In this paper, we present a novel convolutional neural network (CNN) based structured regression model, which is shown to be able to handle touching cells, inhomogeneous background noises, and large variations in sizes and shapes. The proposed method only requires a few training images with weak annotations (just one click near the center of the object). Given an input image patch, instead of providing a single class label like many traditional methods, our algorithm will generate the structured outputs (referred to as proximity patches). These proximity patches, which exhibit higher values for pixels near cell centers, will then be gathered from all testing image patches and fused to obtain the final proximity map, where the maximum positions indicate the cell centroids. The algorithm is tested using three data sets representing different image stains and modalities. The comparative experiments demonstrate the superior performance of this novel method over existing state-of-the-art.

Proceedings Article
01 May 2015
TL;DR: The polarization-based modulation, which is flicker-free, is proposed, to enable a low pulse rate VLC, which makes VLP light-weight enough even for resource-constrained wearable devices, e.g. smart glasses.
Abstract: Visible Light Positioning (VLP) provides a promising means to achieve indoor localization with sub-meter accuracy. We observe that the Visible Light Communication (VLC) methods in existing VLP systems rely on intensity-based modulation, and thus they require a high pulse rate to prevent flickering. However, the high pulse rate adds an unnecessary and heavy burden to receiving devices. To eliminate this burden, we propose the polarization-based modulation, which is flicker-free, to enable a low pulse rate VLC. In this way, we make VLP light-weight enough even for resourceconstrained wearable devices, e.g. smart glasses. Moreover, the polarization-based VLC can be applied to any illuminating light sources, thereby eliminating the dependency on LED. This paper presents the VLP system PIXEL, which realizes our idea. In PIXEL, we develop three techniques, each of which addresses a design challenge: 1) a novel color-based modulation scheme to handle users’ mobility, 2) an adaptive downsampling algorithm to tackle the uneven sampling problem of wearables’ low-cost camera and 3) a computational optimization method for the positioning algorithm to enable real-time processing. We implement PIXEL’s hardware using commodity components and develop a software program for both smartphone and Google glass. Our experiments based on the prototype show that PIXEL can provide accurate realtime VLP for wearables and smartphones with camera resolution as coarse as 60 pixel 80 pixel and CPU frequency as low as 300MHz.

Journal ArticleDOI
27 Jul 2015
TL;DR: An image transform based on the L1 norm for piecewise image flattening that can effectively preserve and sharpen salient edges and contours while eliminating insignificant details, producing a nearly piecewise constant image with sparse structures is introduced.
Abstract: Identifying sparse salient structures from dense pixels is a longstanding problem in visual computing. Solutions to this problem can benefit both image manipulation and understanding. In this paper, we introduce an image transform based on the L1 norm for piecewise image flattening. This transform can effectively preserve and sharpen salient edges and contours while eliminating insignificant details, producing a nearly piecewise constant image with sparse structures. A variant of this image transform can perform edge-preserving smoothing more effectively than existing state-of-the-art algorithms. We further present a new method for complex scene-level intrinsic image decomposition. Our method relies on the above image transform to suppress surface shading variations, and perform probabilistic reflectance clustering on the flattened image instead of the original input image to achieve higher accuracy. Extensive testing on the Intrinsic-Images-in-the-Wild database indicates our method can perform significantly better than existing techniques both visually and numerically. The obtained intrinsic images have been successfully used in two applications, surface retexturing and 3D object compositing in photographs.