scispace - formally typeset
Search or ask a question

Showing papers on "Kernel (image processing) published in 2011"


Proceedings ArticleDOI
20 Jun 2011
TL;DR: This paper proposes a generic and simple framework comprising three steps: constructing a cost volume, fast cost volume filtering and winner-take-all label selection, and achieves state-of-the-art results that achieve disparity maps in real-time, and optical flow fields with very fine structures as well as large displacements.
Abstract: Many computer vision tasks can be formulated as labeling problems. The desired solution is often a spatially smooth labeling where label transitions are aligned with color edges of the input image. We show that such solutions can be efficiently achieved by smoothing the label costs with a very fast edge preserving filter. In this paper we propose a generic and simple framework comprising three steps: (i) constructing a cost volume (ii) fast cost volume filtering and (iii) winner-take-all label selection. Our main contribution is to show that with such a simple framework state-of-the-art results can be achieved for several computer vision applications. In particular, we achieve (i) disparity maps in real-time, whose quality exceeds those of all other fast (local) approaches on the Middlebury stereo benchmark, and (ii) optical flow fields with very fine structures as well as large displacements. To demonstrate robustness, the few parameters of our framework are set to nearly identical values for both applications. Also, competitive results for interactive image segmentation are presented. With this work, we hope to inspire other researchers to leverage this framework to other application areas.

898 citations


Proceedings ArticleDOI
20 Jun 2011
TL;DR: This paper introduces ARC-t, a flexible model for supervised learning of non-linear transformations between domains, based on a novel theoretical result demonstrating that such transformations can be learned in kernel space.
Abstract: In real-world applications, “what you saw” during training is often not “what you get” during deployment: the distribution and even the type and dimensionality of features can change from one dataset to the next. In this paper, we address the problem of visual domain adaptation for transferring object models from one dataset or visual domain to another. We introduce ARC-t, a flexible model for supervised learning of non-linear transformations between domains. Our method is based on a novel theoretical result demonstrating that such transformations can be learned in kernel space. Unlike existing work, our model is not restricted to symmetric transformations, nor to features of the same type and dimensionality, making it applicable to a significantly wider set of adaptation scenarios than previous methods. Furthermore, the method can be applied to categories that were not available during training. We demonstrate the ability of our method to adapt object recognition models under a variety of situations, such as differing imaging conditions, feature types and codebooks.

803 citations


Proceedings ArticleDOI
20 Jun 2011
TL;DR: This paper derives a simple approximated MAPk algorithm which involves only a modest modification of common MAPx, k algorithms, and shows that MAPk can, in fact, be optimized easily, with no additional computational complexity.
Abstract: In blind deconvolution one aims to estimate from an input blurred image y a sharp image x and an unknown blur kernel k. Recent research shows that a key to success is to consider the overall shape of the posterior distribution p(x, k\y) and not only its mode. This leads to a distinction between MAP x, k strategies which estimate the mode pair x, k and often lead to undesired results, and MAP k strategies which select the best k while marginalizing over all possible x images. The MAP k principle is significantly more robust than the MAP x, k one, yet, it involves a challenging marginalization over latent images. As a result, MAP k techniques are considered complicated, and have not been widely exploited. This paper derives a simple approximated MAP k algorithm which involves only a modest modification of common MAP x, k algorithms. We show that MAP k can, in fact, be optimized easily, with no additional computational complexity.

623 citations


Journal ArticleDOI
TL;DR: In this paper, an adaptive convolution of Gaussian white noise with a real-space transfer function kernel together with an adaptive multi-grid Poisson solver is used to generate displacements and velocities following first- or second-order Lagrangian perturbation theory (2LPT).
Abstract: We discuss a new algorithm to generate multi-scale initial conditions with multiple levels of refinements for cosmological 'zoom-in' simulations. The method uses an adaptive convolution of Gaussian white noise with a real-space transfer function kernel together with an adaptive multi-grid Poisson solver to generate displacements and velocities following first- (1LPT) or second-order Lagrangian perturbation theory (2LPT). The new algorithm achieves rms relative errors of the order of 10{sup -4} for displacements and velocities in the refinement region and thus improves in terms of errors by about two orders of magnitude over previous approaches. In addition, errors are localized at coarse-fine boundaries and do not suffer from Fourier-space-induced interference ringing. An optional hybrid multi-grid and Fast Fourier Transform (FFT) based scheme is introduced which has identical Fourier-space behaviour as traditional approaches. Using a suite of re-simulations of a galaxy cluster halo our real-space-based approach is found to reproduce correlation functions, density profiles, key halo properties and subhalo abundances with per cent level accuracy. Finally, we generalize our approach for two-component baryon and dark-matter simulations and demonstrate that the power spectrum evolution is in excellent agreement with linear perturbation theory. For initial baryon density fields, it is suggested to use the local Lagrangian approximation in order to generate a density field for mesh-based codes that is consistent with the Lagrangian perturbation theory instead of the current practice of using the Eulerian linearly scaled densities.

564 citations


Journal ArticleDOI
TL;DR: The previously reported failure of the naive MAP approach is explained by demonstrating that it mostly favors no-blur explanations and it is shown that, using reasonable image priors, a naive simulations MAP estimation of both latent image and blur kernel is guaranteed to fail even with infinitely large images sampled from the prior.
Abstract: Blind deconvolution is the recovery of a sharp version of a blurred image when the blur kernel is unknown. Recent algorithms have afforded dramatic progress, yet many aspects of the problem remain challenging and hard to understand. The goal of this paper is to analyze and evaluate recent blind deconvolution algorithms both theoretically and experimentally. We explain the previously reported failure of the naive MAP approach by demonstrating that it mostly favors no-blur explanations. We show that, using reasonable image priors, a naive simulations MAP estimation of both latent image and blur kernel is guaranteed to fail even with infinitely large images sampled from the prior. On the other hand, we show that since the kernel size is often smaller than the image size, a MAP estimation of the kernel alone is well constrained and is guaranteed to succeed to recover the true blur. The plethora of recent deconvolution techniques makes an experimental evaluation on ground-truth data important. As a first step toward this experimental evaluation, we have collected blur data with ground truth and compared recent algorithms under equal settings. Additionally, our data demonstrate that the shift-invariant blur assumption made by most algorithms is often violated.

416 citations


Proceedings ArticleDOI
05 Dec 2011
TL;DR: A set of kernel features on depth images that model size, 3D shape, and depth edges in a single framework that significantly improve the capabilities of depth and RGB-D (color+depth) recognition, achieving 10–15% improvement in accuracy over the state of the art.
Abstract: Consumer depth cameras, such as the Microsoft Kinect, are capable of providing frames of dense depth values at real time. One fundamental question in utilizing depth cameras is how to best extract features from depth frames. Motivated by local descriptors on images, in particular kernel descriptors, we develop a set of kernel features on depth images that model size, 3D shape, and depth edges in a single framework. Through extensive experiments on object recognition, we show that (1) our local features capture different aspects of cues from a depth frame/view that complement one another; (2) our kernel features significantly outperform traditional 3D features (e.g. Spin images); and (3) we significantly improve the capabilities of depth and RGB-D (color+depth) recognition, achieving 10–15% improvement in accuracy over the state of the art.

338 citations


Proceedings ArticleDOI
06 Nov 2011
TL;DR: This algorithm is used to construct a kernel appropriate for SVM-based image classification, and experiments with the Caltech 101, Caltech 256, and Scenes datasets demonstrate performance that matches or exceeds the state of the art for methods using a single type of features.
Abstract: This paper addresses the problem of category-level image classification. The underlying image model is a graph whose nodes correspond to a dense set of regions, and edges reflect the underlying grid structure of the image and act as springs to guarantee the geometric consistency of nearby regions during matching. A fast approximate algorithm for matching the graphs associated with two images is presented. This algorithm is used to construct a kernel appropriate for SVM-based image classification, and experiments with the Caltech 101, Caltech 256, and Scenes datasets demonstrate performance that matches or exceeds the state of the art for methods using a single type of features.

279 citations


Proceedings ArticleDOI
20 Jun 2011
TL;DR: H hierarchicalkernel descriptors are proposed that apply kernel descriptors recursively to form image-level features and thus provide a conceptually simple and consistent way to generate image- level features from pixel attributes.
Abstract: Kernel descriptors [1] provide a unified way to generate rich visual feature sets by turning pixel attributes into patch-level features, and yield impressive results on many object recognition tasks. However, best results with kernel descriptors are achieved using efficient match kernels in conjunction with nonlinear SVMs, which makes it impractical for large-scale problems. In this paper, we propose hierarchical kernel descriptors that apply kernel descriptors recursively to form image-level features and thus provide a conceptually simple and consistent way to generate image-level features from pixel attributes. More importantly, hierarchical kernel descriptors allow linear SVMs to yield state-of-the-art accuracy while being scalable to large datasets. They can also be naturally extended to extract features over depth images. We evaluate hierarchical kernel descriptors both on the CIFAR10 dataset and the new RGB-D Object Dataset consisting of segmented RGB and depth images of 300 everyday objects.

261 citations


Proceedings ArticleDOI
06 Nov 2011
TL;DR: A novel sparse representation model called centralized sparse representation (CSR) is proposed, which achieves convincing improvement over previous state-of-the-art methods on image restoration tasks by exploiting the nonlocal image statistics.
Abstract: This paper proposes a novel sparse representation model called centralized sparse representation (CSR) for image restoration tasks. In order for faithful image reconstruction, it is expected that the sparse coding coefficients of the degraded image should be as close as possible to those of the unknown original image with the given dictionary. However, since the available data are the degraded (noisy, blurred and/or down-sampled) versions of the original image, the sparse coding coefficients are often not accurate enough if only the local sparsity of the image is considered, as in many existing sparse representation models. To make the sparse coding more accurate, a centralized sparsity constraint is introduced by exploiting the nonlocal image statistics. The local sparsity and the nonlocal sparsity constraints are unified into a variational framework for optimization. Extensive experiments on image restoration validated that our CSR model achieves convincing improvement over previous state-of-the-art methods.

260 citations


Proceedings ArticleDOI
20 Jun 2011
TL;DR: A Bayesian approach to adaptive video super resolution via simultaneously estimating underlying motion, blur kernel and noise level while reconstructing the original high-res frames is proposed.
Abstract: Although multi-frame super resolution has been extensively studied in past decades, super resolving real-world video sequences still remains challenging. In existing systems, either the motion models are oversimplified, or important factors such as blur kernel and noise level are assumed to be known. Such models cannot deal with the scene and imaging conditions that vary from one sequence to another. In this paper, we propose a Bayesian approach to adaptive video super resolution via simultaneously estimating underlying motion, blur kernel and noise level while reconstructing the original high-res frames. As a result, our system not only produces very promising super resolution results that outperform the state of the art, but also adapts to a variety of noise levels and blur kernels. Theoretical analysis of the relationship between blur kernel, noise level and frequency-wise reconstruction rate is also provided, consistent with our experimental results.

260 citations


Proceedings ArticleDOI
20 Jun 2011
TL;DR: This paper presents a “blind” image quality measure, where potentially neither the groundtruth image nor the degradation process are known, and uses a set of novel low-level image features in a machine learning framework to learn a mapping from these features to subjective image quality scores.
Abstract: It is often desirable to evaluate an image based on its quality. For many computer vision applications, a perceptually meaningful measure is the most relevant for evaluation; however, most commonly used measure do not map well to human judgements of image quality. A further complication of many existing image measure is that they require a reference image, which is often not available in practice. In this paper, we present a “blind” image quality measure, where potentially neither the groundtruth image nor the degradation process are known. Our method uses a set of novel low-level image features in a machine learning framework to learn a mapping from these features to subjective image quality scores. The image quality features stem from natural image measure and texture statistics. Experiments on a standard image quality benchmark dataset shows that our method outperforms the current state of art.

Proceedings ArticleDOI
20 Jun 2011
TL;DR: This paper proposes a novel approach to unsupervised integrate such heterogeneous features by performing multi-modal spectral clustering on unlabeled images and unsegmented images using a commonly shared graph Laplacian matrix.
Abstract: In recent years, more and more visual descriptors have been proposed to describe objects and scenes appearing in images. Different features describe different aspects of the visual characteristics. How to combine these heterogeneous features has become an increasing critical problem. In this paper, we propose a novel approach to unsupervised integrate such heterogeneous features by performing multi-modal spectral clustering on unlabeled images and unsegmented images. Considering each type of feature as one modal, our new multi-modal spectral clustering (MMSC) algorithm is to learn a commonly shared graph Laplacian matrix by unifying different modals (image features). A non-negative relaxation is also added in our method to improve the robustness and efficiency of image clustering. We applied our MMSC method to integrate five types of popularly used image features, including SIFT, HOG, GIST, LBP, CENTRIST and evaluated the performance by two benchmark data sets: Caltech-101 and MSRC-v1. Compared with existing unsupervised scene and object categorization methods, our approach always achieves superior performances measured by three standard clustering evaluation metrices.

Reference BookDOI
01 Jan 2011
TL;DR: This chapter discusses the development of the Super-Resolution Framework, a Bayesian Framework for Super Resolution, and its applications in Medical Imaging and Multichannel Sampling.
Abstract: Image Super-Resolution: Historical Overview and Future Challenges, J. Yang and T. Huang Introduction to Super-Resolution Notations Techniques for Super-Resolution Challenge issues for Super-Resolution Super-Resolution Using Adaptive Wiener Filters, R.C. Hardie Introduction Observation Model AWF SR Algorithms Experimental Results Conclusions Acknowledgments Locally Adaptive Kernel Regression for Space-Time Super-Resolution, H. Takeda and P. Milanfar Introduction Adaptive Kernel Regression Examples Conclusion AppendiX Super-Resolution With Probabilistic Motion Estimation, M. Protter and M. Elad Introduction Classic Super-Resolution: Background The Proposed Algorithm Experimental Validation Summary Spatially Adaptive Filtering as Regularization in Inverse Imaging, A. Danielyan, A. Foi, V. Katkovnik, and K. Egiazarian Introduction Iterative filtering as regularization Compressed sensing Super-resolution Conclusions Registration for Super-Resolution, P. Vandewalle, L. Sbaiz, and M. Vetterli Camera Model What Is Resolution? Super-Resolution as a Multichannel Sampling Problem Registration of Totally Aliased Signals Registration of Partially Aliased Signals Conclusions Towards Super-Resolution in the Presence of Spatially Varying Blur, M. Sorel, F. Sroubek and J. Flusser Introduction Defocus and Optical Aberrations Camera Motion Blur Scene Motion Algorithms Conclusion Acknowledgments Toward Robust Reconstruction-Based Super-Resolution, M. Tanaka and M. Okutomi Introduction Overviews Robust SR Reconstruction with Pixel Selection Robust Super-Resolution Using MPEG Motion Vectors Robust Registration for Super-Resolution Conclusions Multi-Frame Super-Resolution from a Bayesian Perspective, L. Pickup, S. Roberts, A. Zisserman and D. Capel The Generative Model Where Super-Resolution Algorithms Go Wrong Simultaneous Super-Resolution Bayesian Marginalization Concluding Remarks Variational Bayesian Super Resolution Reconstruction, S. Derin Babacan, R. Molina, and A.K. Katsaggelos Introduction Problem Formulation Bayesian Framework for Super Resolution Bayesian Inference Variational Bayesian Inference Using TV Image Priors Experiments Estimation of Motion and Blur Conclusions Acknowledgements Pattern Recognition Techniques for Image Super-Resolution, K. Ni and T.Q. Nguyen Introduction Nearest Neighbor Super-Resolution Markov Random Fields and Approximations Kernel Machines for Image Super-Resolution Multiple Learners and Multiple Regressions Design Considerations and Examples Remarks Glossary Super-Resolution Reconstruction of Multi-Channel Images, O.G. Sezer and Y. Altunbasak Introduction Notation Image Acquisition Model Subspace Representation Reconstruction Algorithm Experiments & Discussions Conclusion New Applications of Super-Resolution in Medical Imaging, M.D.Robinson, S.J. Chiu, C.A. Toth, J.A. Izatt, J.Y. Lo, and S. Farsiu Introduction The Super-Resolution Framework New Medical Imaging Applications Conclusion Acknowledgment Practicing Super-Resolution: What Have We Learned? N. Bozinovic Abstract Introduction MotionDSP: History and Concepts Markets and Applications Technology Results Lessons Learned Conclusions

Journal ArticleDOI
TL;DR: The proposed approach generalizes the framework of multiple kernel learning for dimensionality reduction, and distinguishes itself with the following three main contributions: first, the method provides the convenience of using diverse image descriptors to describe useful characteristics of various aspects about the underlying data, and consequently improves their effectiveness.
Abstract: In solving complex visual learning tasks, adopting multiple descriptors to more precisely characterize the data has been a feasible way for improving performance. The resulting data representations are typically high-dimensional and assume diverse forms. Hence, finding a way of transforming them into a unified space of lower dimension generally facilitates the underlying tasks such as object recognition or clustering. To this end, the proposed approach (termed MKL-DR) generalizes the framework of multiple kernel learning for dimensionality reduction, and distinguishes itself with the following three main contributions: First, our method provides the convenience of using diverse image descriptors to describe useful characteristics of various aspects about the underlying data. Second, it extends a broad set of existing dimensionality reduction techniques to consider multiple kernel learning, and consequently improves their effectiveness. Third, by focusing on the techniques pertaining to dimensionality reduction, the formulation introduces a new class of applications with the multiple kernel learning framework to address not only the supervised learning problems but also the unsupervised and semi-supervised ones.

Proceedings ArticleDOI
06 Nov 2011
TL;DR: A novel blur model is proposed that explicitly takes these outliers into account, and a robust non-blind deconvolution method is built upon it, which can effectively reduce the visual artifacts caused by outliers.
Abstract: Non-blind deconvolution is a key component in image deblurring systems. Previous deconvolution methods assume a linear blur model where the blurred image is generated by a linear convolution of the latent image and the blur kernel. This assumption often does not hold in practice due to various types of outliers in the imaging process. Without proper outlier handling, previous methods may generate results with severe ringing artifacts even when the kernel is estimated accurately. In this paper we analyze a few common types of outliers that cause previous methods to fail, such as pixel saturation and non-Gaussian noise. We propose a novel blur model that explicitly takes these outliers into account, and build a robust non-blind deconvolution method upon it, which can effectively reduce the visual artifacts caused by outliers. The effectiveness of our method is demonstrated by experimental results on both synthetic and real-world examples.

Journal ArticleDOI
TL;DR: In this article, a trigonometric range kernel was proposed to realize the bilateral filter in constant time, which is done by generalizing the idea presented by Porikli, i.e., using polynomial kernels.
Abstract: It is well known that spatial averaging can be realized (in space or frequency domain) using algorithms whose complexity does not scale with the size or shape of the filter. These fast algorithms are generally referred to as constant-time or O(1) algorithms in the image-processing literature. Along with the spatial filter, the edge-preserving bilateral filter involves an additional range kernel. This is used to restrict the averaging to those neighborhood pixels whose intensity are similar or close to that of the pixel of interest. The range kernel operates by acting on the pixel intensities. This makes the averaging process nonlinear and computationally intensive, particularly when the spatial filter is large. In this paper, we show how the O(1) averaging algorithms can be leveraged for realizing the bilateral filter in constant time, by using trigonometric range kernels. This is done by generalizing the idea presented by Porikli, i.e., using polynomial kernels. The class of trigonometric kernels turns out to be sufficiently rich, allowing for the approximation of the standard Gaussian bilateral filter. The attractive feature of our approach is that, for a fixed number of terms, the quality of approximation achieved using trigonometric kernels is much superior to that obtained by Porikli using polynomials.

Journal ArticleDOI
TL;DR: This paper proposes the domain adaptation metric learning (DAML), by introducing a data-dependent regularization to the conventional metric learning in the reproducing kernel Hilbert space (RKHS), and proves that learning DAML in RKHS is equivalent to learning DamL in the space spanned by principal components of the kernel principle component analysis (KPCA).
Abstract: The state-of-the-art metric-learning algorithms cannot perform well for domain adaptation settings, such as cross-domain face recognition, image annotation, etc., because labeled data in the source domain and unlabeled ones in the target domain are drawn from different, but related distributions. In this paper, we propose the domain adaptation metric learning (DAML), by introducing a data-dependent regularization to the conventional metric learning in the reproducing kernel Hilbert space (RKHS). This data-dependent regularization resolves the distribution difference by minimizing the empirical maximum mean discrepancy between source and target domain data in RKHS. Theoretically, by using the empirical Rademacher complexity, we prove risk bounds for the nearest neighbor classifier that uses the metric learned by DAML. Practically, learning the metric in RKHS does not scale up well. Fortunately, we can prove that learning DAML in RKHS is equivalent to learning DAML in the space spanned by principal components of the kernel principle component analysis (KPCA). Thus, we can apply KPCA to select most important principal components to significantly reduce the time cost of DAML. We perform extensive experiments over four well-known face recognition datasets and a large-scale Web image annotation dataset for the cross-domain face recognition and image annotation tasks under various settings, and the results demonstrate the effectiveness of DAML.

Proceedings ArticleDOI
06 Nov 2011
TL;DR: An extension of bag-of-words image representations to encode spatial layout using the Fisher kernel framework and a Gaussian mixture model is introduced, which yields an image representation that is computationally efficient, compact, and yields excellent performance while using linear classifiers.
Abstract: We introduce an extension of bag-of-words image representations to encode spatial layout. Using the Fisher kernel framework we derive a representation that encodes the spatial mean and the variance of image regions associated with visual words. We extend this representation by using a Gaussian mixture model to encode spatial layout, and show that this model is related to a soft-assign version of the spatial pyramid representation. We also combine our representation of spatial layout with the use of Fisher kernels to encode the appearance of local features. Through an extensive experimental evaluation, we show that our representation yields state-of-the-art image categorization results, while being more compact than spatial pyramid representations. In particular, using Fisher kernels to encode both appearance and spatial layout results in an image representation that is computationally efficient, compact, and yields excellent performance while using linear classifiers.

Journal ArticleDOI
TL;DR: A framework of a medical image analysis system for the brain tumor segmentation and the brain tumors following-up over time using multi-spectral MRI images is presented and the quantitative evaluations by comparing with experts' manual traces and with other approaches demonstrate the effectiveness of the proposed method.

Proceedings ArticleDOI
06 Nov 2011
TL;DR: This paper presents a joint blind image restoration and recognition method based on the sparse representation prior to handle the challenging problem of face recognition from low-quality images, where the degradation model is realistic and totally unknown.
Abstract: Most previous visual recognition systems simply assume ideal inputs without real-world degradations, such as low resolution, motion blur and out-of-focus blur. In presence of such unknown degradations, the conventional approach first resorts to blind image restoration and then feeds the restored image into a classifier. Treating restoration and recognition separately, such a straightforward approach, however, suffers greatly from the defective output of the ill-posed blind image restoration. In this paper, we present a joint blind image restoration and recognition method based on the sparse representation prior to handle the challenging problem of face recognition from low-quality images, where the degradation model is realistic and totally unknown. The sparse representation prior states that the degraded input image, if correctly restored, will have a good sparse representation in terms of the training set, which indicates the identity of the test image. The proposed algorithm achieves simultaneous restoration and recognition by iteratively solving the blind image restoration in pursuit of the sparest representation for recognition. Based on such a sparse representation prior, we demonstrate that the image restoration task and the recognition task can benefit greatly from each other. Extensive experiments on face datasets under various degradations are carried out and the results of our joint model shows significant improvements over conventional methods of treating the two tasks independently.

Proceedings ArticleDOI
20 Jun 2011
TL;DR: This work estimates camera shake by analyzing edges in the image, effectively constructing the Radon transform of the kernel, and describes two algorithms for estimating spatially invariant blur kernels.
Abstract: Camera shake is a common source of degradation in photographs. Restoring blurred pictures is challenging because both the blur kernel and the sharp image are unknown, which makes this problem severely underconstrained. In this work, we estimate camera shake by analyzing edges in the image, effectively constructing the Radon transform of the kernel. Building upon this result, we describe two algorithms for estimating spatially invariant blur kernels. In the first method, we directly invert the transform, which is computationally efficient since it is not necessary to also estimate the latent sharp image. This approach is well suited for scenes with a diversity of edges, such as man-made environments. In the second method, we incorporate the Radon transform within the MAP estimation framework to jointly estimate the kernel and the image. While more expensive, this algorithm performs well on a broader variety of scenes, even when fewer edges can be observed. Our experiments show that our algorithms achieve comparable results to the state of the art in general and produce superior outputs on man-made scenes and photos degraded by a small kernel.

Patent
03 Apr 2011
TL;DR: In this paper, a convolutional and semi-supervised approach for relation extraction in text is described. But it is not shown how to identify the relational pattern of interest in the text in response to a query.
Abstract: Systems and methods are disclosed to perform relation extraction in text by applying a convolution strategy to determine a kernel between sentences; applying one or more semi-supervised strategies to the kernel to encode syntactic and semantic information to recover a relational pattern of interest; and applying a classifier to the kernel to identify the relational pattern of interest in the text in response to a query.

Proceedings ArticleDOI
06 Nov 2011
TL;DR: This work demonstrates how the kernel trick can be applied in standard NRSFM and model complex, deformable 3D shapes as the outputs of a non-linear mapping whose inputs are points within a low-dimensional shape space.
Abstract: Non-rigid structure from motion (NRSFM) is a difficult, underconstrained problem in computer vision. The standard approach in NRSFM constrains 3D shape deformation using a linear combination of K basis shapes; the solution is then obtained as the low-rank factorization of an input observation matrix. An important but overlooked problem with this approach is that non-linear deformations are often observed; these deformations lead to a weakened low-rank constraint due to the need to use additional basis shapes to linearly model points that move along curves. Here, we demonstrate how the kernel trick can be applied in standard NRSFM. As a result, we model complex, deformable 3D shapes as the outputs of a non-linear mapping whose inputs are points within a low-dimensional shape space. This approach is flexible and can use different kernels to build different non-linear models. Using the kernel trick, our model complements the low-rank constraint by capturing non-linear relationships in the shape coefficients of the linear model. The net effect can be seen as using non-linear dimensionality reduction to further compress the (shape) space of possible solutions.

Book ChapterDOI
18 Sep 2011
TL;DR: A generative model extending least squares linear regression to the space of images by using a second-order dynamic formulation for image registration, which allows for a compact representation of an approximation to the full spatio-temporal trajectory through its initial values.
Abstract: Registration of image-time series has so far been accomplished (i) by concatenating registrations between image pairs, (ii) by solving a joint estimation problem resulting in piecewise geodesic paths between image pairs, (iii) by kernel based local averaging or (iv) by augmenting the joint estimation with additional temporal irregularity penalties. Here, we propose a generative model extending least squares linear regression to the space of images by using a second-order dynamic formulation for image registration. Unlike previous approaches, the formulation allows for a compact representation of an approximation to the full spatio-temporal trajectory through its initial values. The method also opens up possibilities to design image-based approximation algorithms. The resulting optimization problem is solved using an adjoint method.

Journal ArticleDOI
TL;DR: An attempt to address the main problem using the Gaussian kernel-based AD methods is the optimal setting of sigma, with a direct and adaptive measure based on a geometric interpretation of the GK-SVDD.
Abstract: Recently, anomaly detection (AD) has attracted considerable interest in a wide variety of hyperspectral remote sensing applications. The goal of this unsupervised technique of target detection is to identify the pixels with significantly different spectral signatures from the neighboring background. Kernel methods, such as kernel-based support vector data description (SVDD) (K-SVDD), have been presented as the successful approach to AD problems. The most commonly used kernel is the Gaussian kernel function. The main problem using the Gaussian kernel-based AD methods is the optimal setting of sigma. In an attempt to address this problem, this paper proposes a direct and adaptive measure for Gaussian K-SVDD (GK-SVDD). The proposed measure is based on a geometric interpretation of the GK-SVDD. Experimental results are presented on real and synthetically implanted targets of the target detection blind-test data sets. Compared to previous measures, the results demonstrate better performance, particularly for subpixel anomalies.

Proceedings ArticleDOI
12 Feb 2011
TL;DR: An OpenCL framework that combines multiple GPUs and treats them as a single compute device is proposed and its performance is evaluated with a system that contains 8 GPUs using 11 OpenCL benchmark applications.
Abstract: In this paper, we propose an OpenCL framework that combines multiple GPUs and treats them as a single compute device. Providing a single virtual compute device image to the user makes an OpenCL application written for a single GPU portable to the platform that has multiple GPU devices. It also makes the application exploit full computing power of the multiple GPU devices and the total amount of GPU memories available in the platform. Our OpenCL framework automatically distributes at run-time the OpenCL kernel written for a single GPU into multiple CUDA kernels that execute on the multiple GPU devices. It applies a run-time memory access range analysis to the kernel by performing a sampling run and identifies an optimal workload distribution for the kernel. To achieve a single compute device image, the runtime maintains virtual device memory that is allocated in the main memory. The OpenCL runtime treats the memory as if it were the memory of a single GPU device and keeps it consistent to the memories of the multiple GPU devices. Our OpenCL-C-to-C translator generates the sampling code from the OpenCL kernel code and OpenCL-C-to-CUDA-C translator generates the CUDA kernel code for the distributed OpenCL kernel. We show the effectiveness of our OpenCL framework by implementing the OpenCL runtime and two source-to-source translators. We evaluate its performance with a system that contains 8 GPUs using 11 OpenCL benchmark applications.

Proceedings ArticleDOI
06 Nov 2011
TL;DR: This paper introduces a kernelized version of Naive Bayes Nearest Neighbor, and shows that the NBNN kernel is complementary to standard bag-of-features based kernels, focussing on local generalization as opposed to global image composition.
Abstract: Naive Bayes Nearest Neighbor (NBNN) has recently been proposed as a powerful, non-parametric approach for object classification, that manages to achieve remarkably good results thanks to the avoidance of a vector quantization step and the use of image-to-class comparisons, yielding good generalization. In this paper, we introduce a kernelized version of NBNN. This way, we can learn the classifier in a discriminative setting. Moreover, it then becomes straightforward to combine it with other kernels. In particular, we show that our NBNN kernel is complementary to standard bag-of-features based kernels, focussing on local generalization as opposed to global image composition. By combining them, we achieve state-of-the-art results on Caltech101 and 15 Scenes datasets. As a side contribution, we also investigate how to speed up the NBNN computations.

Proceedings ArticleDOI
20 Jun 2011
TL;DR: This paper proposes a supervised segmentation approach that tightly integrates object-level top down information with low-level image cues under a kernelized structural SVM learning framework and defines a novel nonlinear kernel for comparing two image-segmentation masks.
Abstract: Object segmentation needs to be driven by top-down knowledge to produce semantically meaningful results. In this paper, we propose a supervised segmentation approach that tightly integrates object-level top down information with low-level image cues. The information from the two levels is fused under a kernelized structural SVM learning framework. We defined a novel nonlinear kernel for comparing two image-segmentation masks. This kernel combines four different kernels: the object similarity kernel, the object shape kernel, the per-image color distribution kernel, and the global color distribution kernel. Our experiments show that the structured SVM algorithm finds bad segmentations of the training examples given the current scoring function and punishes these bad segmentations to lower scores than the example (good) segmentations. The result is a segmentation algorithm that not only knows what good segmentations are, but also learns potential segmentation mistakes and tries to avoid them. Our proposed approach can obtain comparable performance to other state-of-the-art top-down driven segmentation approaches yet is flexible enough to be applied to widely different domains.

Journal ArticleDOI
11 Oct 2011-Sensors
TL;DR: KDIsomap is used to perform nonlinear dimensionality reduction on the extracted local binary patterns (LBP) facial features, and produce low-dimensional discrimimant embedded data representations with striking performance improvement on facial expression recognition tasks.
Abstract: Facial expression recognition is an interesting and challenging subject. Considering the nonlinear manifold structure of facial images, a new kernel-based manifold learning method, called kernel discriminant isometric mapping (KDIsomap), is proposed. KDIsomap aims to nonlinearly extract the discriminant information by maximizing the interclass scatter while minimizing the intraclass scatter in a reproducing kernel Hilbert space. KDIsomap is used to perform nonlinear dimensionality reduction on the extracted local binary patterns (LBP) facial features, and produce low-dimensional discrimimant embedded data representations with striking performance improvement on facial expression recognition tasks. The nearest neighbor classifier with the Euclidean metric is used for facial expression classification. Facial expression recognition experiments are performed on two popular facial expression databases, i.e., the JAFFE database and the Cohn-Kanade database. Experimental results indicate that KDIsomap obtains the best accuracy of 81.59% on the JAFFE database, and 94.88% on the Cohn-Kanade database. KDIsomap outperforms the other used methods such as principal component analysis (PCA), linear discriminant analysis (LDA), kernel principal component analysis (KPCA), kernel linear discriminant analysis (KLDA) as well as kernel isometric mapping (KIsomap).

Proceedings ArticleDOI
20 Jun 2011
TL;DR: This work proposes the spatial group sparse coding (SGSC) by extending the robust encoding ability of group sparse coded with spatial correlations among training regions with a joint version of the SGSC model which is able to simultaneously encode a group of intrinsically related regions within a test image.
Abstract: Nowadays numerous social images have been emerging on the Web. How to precisely label these images is critical to image retrieval. However, traditional image-level tagging methods may become less effective because global image matching approaches can hardly cope with the diversity and arbitrariness of Web image content. This raises an urgent need for the fine-grained tagging schemes. In this work, we study how to establish mapping between tags and image regions, i.e. localize tags to image regions, so as to better depict and index the content of images. We propose the spatial group sparse coding (SGSC) by extending the robust encoding ability of group sparse coding with spatial correlations among training regions. We present spatial correlations in a two-dimensional image space and design group-specific spatial kernels to produce a more interpretable regularizer. Further we propose a joint version of the SGSC model which is able to simultaneously encode a group of intrinsically related regions within a test image. An effective algorithm is developed to optimize the objective function of the Joint SGSC. The tag localization task is conducted by propagating tags from sparsely selected groups of regions to the target regions according to the reconstruction coefficients. Extensive experiments on three public image datasets illustrate that our proposed models achieve great performance improvements over the state-of-the-art method in the tag localization task.