scispace - formally typeset
Search or ask a question

Showing papers on "Image processing published in 2008"


Journal ArticleDOI
TL;DR: A novel scale- and rotation-invariant detector and descriptor, coined SURF (Speeded-Up Robust Features), which approximates or even outperforms previously proposed schemes with respect to repeatability, distinctiveness, and robustness, yet can be computed and compared much faster.

12,449 citations


Journal ArticleDOI
TL;DR: A novel method for sparse signal recovery that in many situations outperforms ℓ1 minimization in the sense that substantially fewer measurements are needed for exact recovery.
Abstract: It is now well understood that (1) it is possible to reconstruct sparse signals exactly from what appear to be highly incomplete sets of linear measurements and (2) that this can be done by constrained l1 minimization. In this paper, we study a novel method for sparse signal recovery that in many situations outperforms l1 minimization in the sense that substantially fewer measurements are needed for exact recovery. The algorithm consists of solving a sequence of weighted l1-minimization problems where the weights used for the next iteration are computed from the value of the current solution. We present a series of experiments demonstrating the remarkable performance and broad applicability of this algorithm in the areas of sparse signal recovery, statistical estimation, error correction and image processing. Interestingly, superior gains are also achieved when our method is applied to recover signals with assumed near-sparsity in overcomplete representations—not by reweighting the l1 norm of the coefficient sequence as is common, but by reweighting the l1 norm of the transformed object. An immediate consequence is the possibility of highly efficient data acquisition protocols by improving on a technique known as Compressive Sensing.

4,869 citations


Book
01 Jan 2008
TL;DR: The central concept of sparsity is explained and applied to signal compression, noise reduction, and inverse problems, while coverage is given to sparse representations in redundant dictionaries, super-resolution and compressive sensing applications.
Abstract: Mallat's book is the undisputed reference in this field - it is the only one that covers the essential material in such breadth and depth. - Laurent Demanet, Stanford University The new edition of this classic book gives all the major concepts, techniques and applications of sparse representation, reflecting the key role the subject plays in today's signal processing. The book clearly presents the standard representations with Fourier, wavelet and time-frequency transforms, and the construction of orthogonal bases with fast algorithms. The central concept of sparsity is explained and applied to signal compression, noise reduction, and inverse problems, while coverage is given to sparse representations in redundant dictionaries, super-resolution and compressive sensing applications. Features: * Balances presentation of the mathematics with applications to signal processing * Algorithms and numerical examples are implemented in WaveLab, a MATLAB toolbox * Companion website for instructors and selected solutions and code available for students New in this edition * Sparse signal representations in dictionaries * Compressive sensing, super-resolution and source separation * Geometric image processing with curvelets and bandlets * Wavelets for computer graphics with lifting on surfaces * Time-frequency audio processing and denoising * Image compression with JPEG-2000 * New and updated exercises A Wavelet Tour of Signal Processing: The Sparse Way, third edition, is an invaluable resource for researchers and R&D engineers wishing to apply the theory in fields such as image processing, video processing and compression, bio-sensing, medical imaging, machine vision and communications engineering. Stephane Mallat is Professor in Applied Mathematics at cole Polytechnique, Paris, France. From 1986 to 1996 he was a Professor at the Courant Institute of Mathematical Sciences at New York University, and between 2001 and 2007, he co-founded and became CEO of an image processing semiconductor company. Companion website: A Numerical Tour of Signal Processing * Includes all the latest developments since the book was published in 1999, including its application to JPEG 2000 and MPEG-4 * Algorithms and numerical examples are implemented in Wavelab, a MATLAB toolbox * Balances presentation of the mathematics with applications to signal processing

2,600 citations


Journal ArticleDOI
TL;DR: For certain classes that are particularly prevalent in the dataset, such as people, this work is able to demonstrate a recognition performance comparable to class-specific Viola-Jones style detectors.
Abstract: With the advent of the Internet, billions of images are now freely available online and constitute a dense sampling of the visual world. Using a variety of non-parametric methods, we explore this world with the aid of a large dataset of 79,302,017 images collected from the Internet. Motivated by psychophysical results showing the remarkable tolerance of the human visual system to degradations in image resolution, the images in the dataset are stored as 32 x 32 color images. Each image is loosely labeled with one of the 75,062 non-abstract nouns in English, as listed in the Wordnet lexical database. Hence the image database gives a comprehensive coverage of all object categories and scenes. The semantic information from Wordnet can be used in conjunction with nearest-neighbor methods to perform object classification over a range of semantic levels minimizing the effects of labeling noise. For certain classes that are particularly prevalent in the dataset, such as people, we are able to demonstrate a recognition performance comparable to class-specific Viola-Jones style detectors.

1,871 citations


Journal ArticleDOI
TL;DR: A closed-form solution to natural image matting that allows us to find the globally optimal alpha matte by solving a sparse linear system of equations and predicts the properties of the solution by analyzing the eigenvectors of a sparse matrix, closely related to matrices used in spectral image segmentation algorithms.
Abstract: Interactive digital matting, the process of extracting a foreground object from an image based on limited user input, is an important task in image and video editing. From a computer vision perspective, this task is extremely challenging because it is massively ill-posed - at each pixel we must estimate the foreground and the background colors, as well as the foreground opacity ("alpha matte") from a single color measurement. Current approaches either restrict the estimation to a small part of the image, estimating foreground and background colors based on nearby pixels where they are known, or perform iterative nonlinear estimation by alternating foreground and background color estimation with alpha estimation. In this paper, we present a closed-form solution to natural image matting. We derive a cost function from local smoothness assumptions on foreground and background colors and show that in the resulting expression, it is possible to analytically eliminate the foreground and background colors to obtain a quadratic cost function in alpha. This allows us to find the globally optimal alpha matte by solving a sparse linear system of equations. Furthermore, the closed-form formula allows us to predict the properties of the solution by analyzing the eigenvectors of a sparse matrix, closely related to matrices used in spectral image segmentation algorithms. We show that high-quality mattes for natural images may be obtained from a small amount of user input.

1,851 citations


Journal ArticleDOI
TL;DR: This work puts forward ways for handling nonhomogeneous noise and missing information, paving the way to state-of-the-art results in applications such as color image denoising, demosaicing, and inpainting, as demonstrated in this paper.
Abstract: Sparse representations of signals have drawn considerable interest in recent years. The assumption that natural signals, such as images, admit a sparse decomposition over a redundant dictionary leads to efficient algorithms for handling such sources of data. In particular, the design of well adapted dictionaries for images has been a major challenge. The K-SVD has been recently proposed for this task and shown to perform very well for various grayscale image processing tasks. In this paper, we address the problem of learning dictionaries for color images and extend the K-SVD-based grayscale image denoising algorithm that appears in . This work puts forward ways for handling nonhomogeneous noise and missing information, paving the way to state-of-the-art results in applications such as color image denoising, demosaicing, and inpainting, as demonstrated in this paper.

1,818 citations


Journal ArticleDOI
TL;DR: An iterative algorithm, based on recent work in compressive sensing, that minimizes the total variation of the image subject to the constraint that the estimated projection data is within a specified tolerance of the available data and that the values of the volume image are non-negative is developed.
Abstract: An iterative algorithm, based on recent work in compressive sensing, is developed for volume image reconstruction from a circular cone-beam scan. The algorithm minimizes the total variation (TV) of the image subject to the constraint that the estimated projection data is within a specified tolerance of the available data and that the values of the volume image are non-negative. The constraints are enforced by the use of projection onto convex sets (POCS) and the TV objective is minimized by steepest descent with an adaptive step-size. The algorithm is referred to as adaptive-steepest-descent-POCS (ASD-POCS). It appears to be robust against cone-beam artifacts, and may be particularly useful when the angular range is limited or when the angular sampling rate is low. The ASD-POCS algorithm is tested with the Defrise disk and jaw computerized phantoms. Some comparisons are performed with the POCS and expectation-maximization (EM) algorithms. Although the algorithm is presented in the context of circular cone-beam image reconstruction, it can also be applied to scanning geometries involving other x-ray source trajectories.

1,786 citations


Journal ArticleDOI
TL;DR: Three new algorithms for 2D translation image registration to within a small fraction of a pixel that use nonlinear optimization and matrix-multiply discrete Fourier transforms are compared to evaluate a translation-invariant error metric.
Abstract: Three new algorithms for 2D translation image registration to within a small fraction of a pixel that use nonlinear optimization and matrix-multiply discrete Fourier transforms are compared. These algorithms can achieve registration with an accuracy equivalent to that of the conventional fast Fourier transform upsampling approach in a small fraction of the computation time and with greatly reduced memory requirements. Their accuracy and computation time are compared for the purpose of evaluating a translation-invariant error metric.

1,715 citations


Journal ArticleDOI
TL;DR: This work proposes a region-based active contour model that draws upon intensity information in local regions at a controllable scale to cope with intensity inhomogeneity and shows desirable performances of this model.
Abstract: Intensity inhomogeneities often occur in real-world images and may cause considerable difficulties in image segmentation. In order to overcome the difficulties caused by intensity inhomogeneities, we propose a region-based active contour model that draws upon intensity information in local regions at a controllable scale. A data fitting energy is defined in terms of a contour and two fitting functions that locally approximate the image intensities on the two sides of the contour. This energy is then incorporated into a variational level set formulation with a level set regularization term, from which a curve evolution equation is derived for energy minimization. Due to a kernel function in the data fitting term, intensity information in local regions is extracted to guide the motion of the contour, which thereby enables our model to cope with intensity inhomogeneity. In addition, the regularity of the level set function is intrinsically preserved by the level set regularization term to ensure accurate computation and avoids expensive reinitialization of the evolving level set function. Experimental results for synthetic and real images show desirable performances of our method.

1,630 citations


Proceedings ArticleDOI
23 Jun 2008
TL;DR: It is shown that a small set of randomly chosen raw patches from training images of similar statistical nature to the input image generally serve as a good dictionary, in the sense that the computed representation is sparse and the recovered high-resolution image is competitive or even superior in quality to images produced by other SR methods.
Abstract: This paper addresses the problem of generating a super-resolution (SR) image from a single low-resolution input image. We approach this problem from the perspective of compressed sensing. The low-resolution image is viewed as downsampled version of a high-resolution image, whose patches are assumed to have a sparse representation with respect to an over-complete dictionary of prototype signal-atoms. The principle of compressed sensing ensures that under mild conditions, the sparse representation can be correctly recovered from the downsampled signal. We will demonstrate the effectiveness of sparsity as a prior for regularizing the otherwise ill-posed super-resolution problem. We further show that a small set of randomly chosen raw patches from training images of similar statistical nature to the input image generally serve as a good dictionary, in the sense that the computed representation is sparse and the recovered high-resolution image is competitive or even superior in quality to images produced by other SR methods.

1,546 citations


Book
26 Dec 2008
TL;DR: The central concept of sparsity is explained and applied to signal compression, noise reduction, and inverse problems, while coverage is given to sparse representations in redundant dictionaries, super-resolution and compressive sensing applications.
Abstract: Mallat's book is the undisputed reference in this field - it is the only one that covers the essential material in such breadth and depth. - Laurent Demanet, Stanford UniversityThe new edition of this classic book gives all the major concepts, techniques and applications of sparse representation, reflecting the key role the subject plays in today's signal processing. The book clearly presents the standard representations with Fourier, wavelet and time-frequency transforms, and the construction of orthogonal bases with fast algorithms. The central concept of sparsity is explained and applied to signal compression, noise reduction, and inverse problems, while coverage is given to sparse representations in redundant dictionaries, super-resolution and compressive sensing applications.Features:* Balances presentation of the mathematics with applications to signal processing* Algorithms and numerical examples are implemented in WaveLab, a MATLAB toolbox* Companion website for instructors and selected solutions and code available for studentsNew in this edition* Sparse signal representations in dictionaries* Compressive sensing, super-resolution and source separation* Geometric image processing with curvelets and bandlets* Wavelets for computer graphics with lifting on surfaces* Time-frequency audio processing and denoising* Image compression with JPEG-2000* New and updated exercisesA Wavelet Tour of Signal Processing: The Sparse Way, third edition, is an invaluable resource for researchers and R&D engineers wishing to apply the theory in fields such as image processing, video processing and compression, bio-sensing, medical imaging, machine vision and communications engineering.Stephane Mallat is Professor in Applied Mathematics at cole Polytechnique, Paris, France. From 1986 to 1996 he was a Professor at the Courant Institute of Mathematical Sciences at New York University, and between 2001 and 2007, he co-founded and became CEO of an image processing semiconductor company.Companion website: A Numerical Tour of Signal Processing Includes all the latest developments since the book was published in 1999, including itsapplication to JPEG 2000 and MPEG-4Algorithms and numerical examples are implemented in Wavelab, a MATLAB toolboxBalances presentation of the mathematics with applications to signal processing

Book ChapterDOI
20 Oct 2008
TL;DR: This paper shows that the dramatically different approach of using priors dynamically to guide a feature by feature matching search can achieve global matching with much fewer image processing operations and lower overall computational cost.
Abstract: In the matching tasks which form an integral part of all types of tracking and geometrical vision, there are invariably priors available on the absolute and/or relative image locations of features of interest. Usually, these priors are used post-hoc in the process of resolving feature matches and obtaining final scene estimates, via `first get candidate matches, then resolve' consensus algorithms such as RANSAC. In this paper we show that the dramatically different approach of using priors dynamically to guide a feature by feature matching search can achieve global matching with much fewer image processing operations and lower overall computational cost. Essentially, we put image processing into the loopof the search for global consensus. In particular, our approach is able to cope with significant image ambiguity thanks to a dynamic mixture of Gaussians treatment. In our fully Bayesian algorithm, the choice of the most efficient search action at each step is guided intuitively and rigorously by expected Shannon information gain. We demonstrate the algorithm in feature matching as part of a sequential SLAM system for 3D camera tracking. Robust, real-time matching can be achieved even in the previously unmanageable case of jerky, rapid motion necessitating weak motion modelling and large search regions.

Journal ArticleDOI
TL;DR: An extensive evaluation of the unsupervised objective evaluation methods that have been proposed in the literature are presented and the advantages and shortcomings of the underlying design mechanisms in these methods are discussed and analyzed.

Journal ArticleDOI
TL;DR: An overview of the fundamental principles of operation of this technology and the influence of geometric and software parameters on image quality and patient radiation dose are provided.

Proceedings Article
08 Dec 2008
TL;DR: An approach to low-level vision is presented that combines the use of convolutional networks as an image processing architecture and an unsupervised learning procedure that synthesizes training samples from specific noise models to avoid computational difficulties in MRF approaches that arise from probabilistic learning and inference.
Abstract: We present an approach to low-level vision that combines two main ideas: the use of convolutional networks as an image processing architecture and an unsupervised learning procedure that synthesizes training samples from specific noise models. We demonstrate this approach on the challenging problem of natural image denoising. Using a test set with a hundred natural images, we find that convolutional networks provide comparable and in some cases superior performance to state of the art wavelet and Markov random field (MRF) methods. Moreover, we find that a convolutional network offers similar performance in the blind de-noising setting as compared to other techniques in the non-blind setting. We also show how convolutional networks are mathematically related to MRF approaches by presenting a mean field theory for an MRF specially designed for image denoising. Although these approaches are related, convolutional networks avoid computational difficulties in MRF approaches that arise from probabilistic learning and inference. This makes it possible to learn image processing architectures that have a high degree of representational power (we train models with over 15,000 parameters), but whose computational expense is significantly less than that associated with inference in MRF approaches with even hundreds of parameters.

Journal ArticleDOI
TL;DR: A unified framework for identifying the source digital camera from its images and for revealing digitally altered images using photo-response nonuniformity noise (PRNU), which is a unique stochastic fingerprint of imaging sensors is provided.
Abstract: In this paper, we provide a unified framework for identifying the source digital camera from its images and for revealing digitally altered images using photo-response nonuniformity noise (PRNU), which is a unique stochastic fingerprint of imaging sensors. The PRNU is obtained using a maximum-likelihood estimator derived from a simplified model of the sensor output. Both digital forensics tasks are then achieved by detecting the presence of sensor PRNU in specific regions of the image under investigation. The detection is formulated as a hypothesis testing problem. The statistical distribution of the optimal test statistics is obtained using a predictor of the test statistics on small image blocks. The predictor enables more accurate and meaningful estimation of probabilities of false rejection of a correct camera and missed detection of a tampered region. We also include a benchmark implementation of this framework and detailed experimental validation. The robustness of the proposed forensic methods is tested on common image processing, such as JPEG compression, gamma correction, resizing, and denoising.

Journal ArticleDOI
TL;DR: A system for automatic, geo-registered, real-time 3D reconstruction from video of urban scenes that extends existing algorithms to meet the robustness and variability necessary to operate out of the lab and shows results on real video sequences comprising hundreds of thousands of frames.
Abstract: The paper presents a system for automatic, geo-registered, real-time 3D reconstruction from video of urban scenes. The system collects video streams, as well as GPS and inertia measurements in order to place the reconstructed models in geo-registered coordinates. It is designed using current state of the art real-time modules for all processing steps. It employs commodity graphics hardware and standard CPU's to achieve real-time performance. We present the main considerations in designing the system and the steps of the processing pipeline. Our system extends existing algorithms to meet the robustness and variability necessary to operate out of the lab. To account for the large dynamic range of outdoor videos the processing pipeline estimates global camera gain changes in the feature tracking stage and efficiently compensates for these in stereo estimation without impacting the real-time performance. The required accuracy for many applications is achieved with a two-step stereo reconstruction process exploiting the redundancy across frames. We show results on real video sequences comprising hundreds of thousands of frames.

Proceedings ArticleDOI
23 Jun 2008
TL;DR: This article proposes an energy formulation with both sparse reconstruction and class discrimination components, jointly optimized during dictionary learning, for local image discrimination tasks, and paves the way for a novel scene analysis and recognition framework based on simultaneously learning discriminative and reconstructive dictionaries.
Abstract: Sparse signal models have been the focus of much recent research, leading to (or improving upon) state-of-the-art results in signal, image, and video restoration. This article extends this line of research into a novel framework for local image discrimination tasks, proposing an energy formulation with both sparse reconstruction and class discrimination components, jointly optimized during dictionary learning. This approach improves over the state of the art in texture segmentation experiments using the Brodatz database, and it paves the way for a novel scene analysis and recognition framework based on simultaneously learning discriminative and reconstructive dictionaries. Preliminary results in this direction using examples from the Pascal VOC06 and Graz02 datasets are presented as well.

Journal ArticleDOI
TL;DR: This work proposes an approach based on self organization through artificial neural networks, widely applied in human image processing systems and more generally in cognitive science, that can handle scenes containing moving backgrounds, gradual illumination variations and camouflage, and achieves robust detection for different types of videos taken with stationary cameras.
Abstract: Detection of moving objects in video streams is the first relevant step of information extraction in many computer vision applications. Aside from the intrinsic usefulness of being able to segment video streams into moving and background components, detecting moving objects provides a focus of attention for recognition, classification, and activity analysis, making these later steps more efficient. We propose an approach based on self organization through artificial neural networks, widely applied in human image processing systems and more generally in cognitive science. The proposed approach can handle scenes containing moving backgrounds, gradual illumination variations and camouflage, has no bootstrapping limitations, can include into the background model shadows cast by moving objects, and achieves robust detection for different types of videos taken with stationary cameras. We compare our method with other modeling techniques and report experimental results, both in terms of detection accuracy and in terms of processing speed, for color video sequences that represent typical situations critical for video surveillance systems.

Journal ArticleDOI
TL;DR: The framework takes advantage of the integration of image processing, geometric analysis and mesh generation techniques, with an accent on full automation and high-level interaction, to be performed in the context of large-scale studies.
Abstract: We present a modeling framework designed for patient-specific computational hemodynamics to be performed in the context of large-scale studies. The framework takes advantage of the integration of image processing, geometric analysis and mesh generation techniques, with an accent on full automation and high-level interaction. Image segmentation is performed using implicit deformable models taking advantage of a novel approach for selective initialization of vascular branches, as well as of a strategy for the segmentation of small vessels. A robust definition of centerlines provides objective geometric criteria for the automation of surface editing and mesh generation. The framework is available as part of an open-source effort, the Vascular Modeling Toolkit, a first step towards the sharing of tools and data which will be necessary for computational hemodynamics to play a role in evidence-based medicine.

Journal ArticleDOI
TL;DR: This work proposes a model that incorporates both monocular cues and stereo (triangulation) cues, to obtain significantly more accurate depth estimates than is possible using either monocular or stereo cues alone.
Abstract: We consider the task of 3-d depth estimation from a single still image. We take a supervised learning approach to this problem, in which we begin by collecting a training set of monocular images (of unstructured indoor and outdoor environments which include forests, sidewalks, trees, buildings, etc.) and their corresponding ground-truth depthmaps. Then, we apply supervised learning to predict the value of the depthmap as a function of the image. Depth estimation is a challenging problem, since local features alone are insufficient to estimate depth at a point, and one needs to consider the global context of the image. Our model uses a hierarchical, multiscale Markov Random Field (MRF) that incorporates multiscale local- and global-image features, and models the depths and the relation between depths at different points in the image. We show that, even on unstructured scenes, our algorithm is frequently able to recover fairly accurate depthmaps. We further propose a model that incorporates both monocular cues and stereo (triangulation) cues, to obtain significantly more accurate depth estimates than is possible using either monocular or stereo cues alone.

Journal ArticleDOI
TL;DR: In this paper, Wang and Bovik's image quality index (QI) was used to evaluate the quality of pan-chromatic multispectral images without resorting to reference originals.
Abstract: This paper introduces a novel approach for evaluating the quality of pansharpened multispectral (MS) imagery without resorting to reference originals. Hence, evaluations are feasible at the highest spatial resolution of the panchromatic (PAN) sensor. Wang and Bovik’s image quality index (QI) provides a statistical similarity measurement between two monochrome images. The QI values between any couple of MS bands are calculated before and after fusion and used to define a measurement of spectral distortion. Analogously, QI values between each MS band and the PAN image are calculated before and after fusion to yield a measurement of spatial distortion. The rationale is that such QI values should be unchanged after fusion, i.e., when the spectral information is translated from the coarse scale of the MS data to the fine scale of the PAN image. Experimental results, carried out on very high-resolution Ikonos data and simulated Pleiades data, demonstrate that the results provided by the proposed approach are consistent and in trend with analysis performed on spatially degraded data. However, the proposed method requires no reference originals and is therefore usable in all practical cases.

Journal ArticleDOI
TL;DR: It is shown that the Amelioration de la Resolution Spatiale par Injection de Structures concept prevents from introducing spectral distortion into fused products and offers a reliable framework for further developments.
Abstract: Our framework is the synthesis of multispectral images (MS) at higher spatial resolution, which should be as close as possible to those that would have been acquired by the corresponding sensors if they had this high resolution. This synthesis is performed with the help of a high spatial but low spectral resolution image: the panchromatic (Pan) image. The fusion of the Pan and MS images is classically referred as pan-sharpening. A fused product reaches good quality only if the characteristics and differences between input images are taken into account. Dissimilarities existing between these two data sets originate from two causes-different times and different spectral bands of acquisition. Remote sensing physics should be carefully considered while designing the fusion process. Because of the complexity of physics and the large number of unknowns, authors are led to make assumptions to drive their development. Weaknesses and strengths of each reported method are raised and confronted to these physical constraints. The conclusion of this critical survey of literature is that the choice in the assumptions for the development of a method is crucial, with the risk to drastically weaken fusion performance. It is also shown that the Amelioration de la Resolution Spatiale par Injection de Structures concept prevents from introducing spectral distortion into fused products and offers a reliable framework for further developments.

Journal ArticleDOI
TL;DR: This paper offers to researchers a link to a public image database to define a common reference point for LPR algorithmic assessment and issues such as processing time, computational power, and recognition rate are addressed.
Abstract: License plate recognition (LPR) algorithms in images or videos are generally composed of the following three processing steps: 1) extraction of a license plate region; 2) segmentation of the plate characters; and 3) recognition of each character This task is quite challenging due to the diversity of plate formats and the nonuniform outdoor illumination conditions during image acquisition Therefore, most approaches work only under restricted conditions such as fixed illumination, limited vehicle speed, designated routes, and stationary backgrounds Numerous techniques have been developed for LPR in still images or video sequences, and the purpose of this paper is to categorize and assess them Issues such as processing time, computational power, and recognition rate are also addressed, when available Finally, this paper offers to researchers a link to a public image database to define a common reference point for LPR algorithmic assessment

Journal ArticleDOI
TL;DR: This paper model the distribution of the texture features using a mixture of Gaussian distributions, allowing the mixture components to be degenerate or nearly-degenerate, and shows that such a mixture distribution can be effectively segmented by a simple agglomerative clustering algorithm derived from a lossy data compression approach.

Journal ArticleDOI
TL;DR: The proposed framework provides an alternative to pre-defined dictionaries such as wavelets, and leads to state-of-the-art results in a number of image and video enhancement and restoration applications.
Abstract: This paper presents a framework for learning multiscale sparse representations of color images and video with overcomplete dictionaries. A single-scale K-SVD algorithm was introduced in [M. Aharon, M. Elad, and A. M. Bruckstein, IEEE Trans. Signal Process., 54 (2006), pp. 4311–4322], formulating sparse dictionary learning for grayscale image representation as an optimization problem, efficiently solved via orthogonal matching pursuit (OMP) and singular value decomposition (SVD). Following this work, we propose a multiscale learned representation, obtained by using an efficient quadtree decomposition of the learned dictionary and overlapping image patches. The proposed framework provides an alternative to predefined dictionaries such as wavelets and is shown to lead to state-of-the-art results in a number of image and video enhancement and restoration applications. This paper describes the proposed framework and accompanies it by numerous examples demonstrating its strength.

Journal ArticleDOI
10 Dec 2008-Neuron
TL;DR: This study reconstructed visual images by combining local image bases of multiple scales, whose contrasts were independently decoded from fMRI activity by automatically selecting relevant voxels and exploiting their correlated patterns.

Journal ArticleDOI
TL;DR: This work presents an online method that makes it possible to detect when an image comes from an already perceived scene using local shape and color information, and extends the bag-of-words method used in image classification to incremental conditions and relies on Bayesian filtering to estimate loop-closure probability.
Abstract: In robotic applications of visual simultaneous localization and mapping techniques, loop-closure detection and global localization are two issues that require the capacity to recognize a previously visited place from current camera measurements. We present an online method that makes it possible to detect when an image comes from an already perceived scene using local shape and color information. Our approach extends the bag-of-words method used in image classification to incremental conditions and relies on Bayesian filtering to estimate loop-closure probability. We demonstrate the efficiency of our solution by real-time loop-closure detection under strong perceptual aliasing conditions in both indoor and outdoor image sequences taken with a handheld camera.

Proceedings ArticleDOI
23 Jun 2008
TL;DR: A CUDA implementation of the ldquobrute forcerdquo kNN search and it is shown a speed increase on synthetic and real data by up to one or two orders of magnitude depending on the data, with a quasi-linear behavior with respect to the data size in a given, practical range.
Abstract: Statistical measures coming from information theory represent interesting bases for image and video processing tasks such as image retrieval and video object tracking. For example, let us mention the entropy and the Kullback-Leibler divergence. Accurate estimation of these measures requires to adapt to the local sample density, especially if the data are high-dimensional. The k nearest neighbor (kNN) framework has been used to define efficient variable-bandwidth kernel-based estimators with such a locally adaptive property. Unfortunately, these estimators are computationally intensive since they rely on searching neighbors among large sets of d-dimensional vectors. This computational burden can be reduced by pre-structuring the data, e.g. using binary trees as proposed by the approximated nearest neighbor (ANN) library. Yet, the recent opening of graphics processing units (GPU) to general-purpose computation by means of the NVIDIA CUDA API offers the image and video processing community a powerful platform with parallel calculation capabilities. In this paper, we propose a CUDA implementation of the ldquobrute forcerdquo kNN search and we compare its performances to several CPU-based implementations including an equivalent brute force algorithm and ANN. We show a speed increase on synthetic and real data by up to one or two orders of magnitude depending on the data, with a quasi-linear behavior with respect to the data size in a given, practical range.

Journal ArticleDOI
TL;DR: This work cast the image-ranking problem into the task of identifying "authority" nodes on an inferred visual similarity graph and proposes VisualRank to analyze the visual link structures among images and describes the techniques required to make this system practical for large-scale deployment in commercial search engines.
Abstract: Because of the relative ease in understanding and processing text, commercial image-search systems often rely on techniques that are largely indistinguishable from text search. Recently, academic studies have demonstrated the effectiveness of employing image-based features to provide either alternative or additional signals to use in this process. However, it remains uncertain whether such techniques will generalize to a large number of popular Web queries and whether the potential improvement to search quality warrants the additional computational cost. In this work, we cast the image-ranking problem into the task of identifying "authority" nodes on an inferred visual similarity graph and propose VisualRank to analyze the visual link structures among images. The images found to be "authorities" are chosen as those that answer the image-queries well. To understand the performance of such an approach in a real system, we conducted a series of large-scale experiments based on the task of retrieving images for 2,000 of the most popular products queries. Our experimental results show significant improvement, in terms of user satisfaction and relevancy, in comparison to the most recent Google image search results. Maintaining modest computational cost is vital to ensuring that this procedure can be used in practice; we describe the techniques required to make this system practical for large-scale deployment in commercial search engines.