Author
Daniel Cabrini Hauagge
Other affiliations: State University of Campinas, Brown University
Bio: Daniel Cabrini Hauagge is an academic researcher from Cornell University. The author has contributed to research in topics: Ambient occlusion & Image formation. The author has an hindex of 7, co-authored 10 publications receiving 476 citations. Previous affiliations of Daniel Cabrini Hauagge include State University of Campinas & Brown University.
Papers
More filters
TL;DR: A unified approach that can combine many features and classifiers that requires less training and is more adequate to some problems than a naive method, where all features are simply concatenated and fed independently to each classification algorithm.
Abstract: Contemporary Vision and Pattern Recognition problems such as face recognition, fingerprinting identification, image categorization, and DNA sequencing often have an arbitrarily large number of classes and properties to consider. To deal with such complex problems using just one feature descriptor is a difficult task and feature fusion may become mandatory. Although normal feature fusion is quite effective for some problems, it can yield unexpected classification results when the different features are not properly normalized and preprocessed. Besides it has the drawback of increasing the dimensionality which might require more training data. To cope with these problems, this paper introduces a unified approach that can combine many features and classifiers that requires less training and is more adequate to some problems than a naive method, where all features are simply concatenated and fed independently to each classification algorithm. Besides that, the presented technique is amenable to continuous learning, both when refining a learned model and also when adding new classes to be discriminated. The introduced fusion approach is validated using a multi-class fruit-and-vegetable categorization task in a semi-controlled environment, such as a distribution center or the supermarket cashier. The results show that the solution is able to reduce the classification error in up to 15 percentage points with respect to the baseline.
259 citations
16 Jun 2012
TL;DR: A new technique for extracting local features from images of architectural scenes, based on detecting and representing local symmetries, which can improve matching performance for this difficult task of matching challenging pairs of photos of urban scenes.
Abstract: We present a new technique for extracting local features from images of architectural scenes, based on detecting and representing local symmetries. These new features are motivated by the fact that local symmetries, at different scales, are a fundamental characteristic of many urban images, and are potentially more invariant to large appearance changes than lower-level features such as SIFT. Hence, we apply these features to the problem of matching challenging pairs of photos of urban scenes. Our features are based on simple measures of local bilateral and rotational symmetries computed using local image operations. These measures are used both for feature detection and for computing descriptors. We demonstrate our method on a challenging new dataset containing image pairs exhibiting a range of dramatic variations in lighting, age, and rendering style, and show that our features can improve matching performance for this difficult task.
166 citations
01 Jan 2014
TL;DR: This work proposes the use of sophisticated outdoor illumination models, developed in the computer graphics community, for estimating appearance and timestamps from a large set of uncalibrated images of an outdoor scene, and develops a data-driven method for estimating per-point albedo and local visibility information from a set of Internet photos taken under varying, unknown illuminations.
Abstract: Natural illumination from the sun and sky plays a significant role in the appearance of outdoor scenes. We propose the use of sophisticated outdoor illumination models, developed in the computer graphics community, for estimating appearance and timestamps from a large set of uncalibrated images of an outdoor scene. We first present an analysis of the relationship between these illumination models and the geolocation, time, surface orientation, and local visibility at a scene point. We then use this relationship to devise a data-driven method for estimating per-point albedo and local visibility information from a set of Internet photos taken under varying, unknown illuminations. Our approach significantly extends prior work on appearance estimation to work with sun-sky models, and enables new applications, such as computing timestamps for individual photos using shading information.
34 citations
23 Jun 2013
TL;DR: This work shows that ambient occlusion can be approximated using simple, per-pixel statistics over image stacks, based on a simplified image formation model, and uses the derived AO measure to compute reflectance and illumination for objects without relying on additional smoothness priors.
Abstract: We present a method for computing ambient occlusion (AO) for a stack of images of a scene from a fixed viewpoint. Ambient occlusion, a concept common in computer graphics, characterizes the local visibility at a point: it approximates how much light can reach that point from different directions without getting blocked by other geometry. While AO has received surprisingly little attention in vision, we show that it can be approximated using simple, per-pixel statistics over image stacks, based on a simplified image formation model. We use our derived AO measure to compute reflectance and illumination for objects without relying on additional smoothness priors, and demonstrate state-of-the art performance on the MIT Intrinsic Images benchmark. We also demonstrate our method on several synthetic and real scenes, including 3D printed objects with known ground truth geometry.
33 citations
12 Oct 2008
TL;DR: A system to solve a multi-class produce categorization problem using statistical color, texture, and structural appearance descriptors (bag-of-features) in many different ways to improve the overall accuracy of the system.
Abstract: We propose a system to solve a multi-class produce categorization problem. For that, we use statistical color, texture, and structural appearance descriptors (bag-of-features). As the best combination setup is not known for our problem, we combine several individual features from the state-of-the-art in many different ways to assess how they interact to improve the overall accuracy of the system. We validate the system using an image data set collected on our local fruits and vegetables distribution center.
21 citations
Cited by
More filters
01 Jan 2006
TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.
Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.
10,141 citations
01 Jan 2004
TL;DR: Comprehensive and up-to-date, this book includes essential topics that either reflect practical significance or are of theoretical importance and describes numerous important application areas such as image based rendering and digital libraries.
Abstract: From the Publisher:
The accessible presentation of this book gives both a general view of the entire computer vision enterprise and also offers sufficient detail to be able to build useful applications. Users learn techniques that have proven to be useful by first-hand experience and a wide range of mathematical methods. A CD-ROM with every copy of the text contains source code for programming practice, color images, and illustrative movies. Comprehensive and up-to-date, this book includes essential topics that either reflect practical significance or are of theoretical importance. Topics are discussed in substantial and increasing depth. Application surveys describe numerous important application areas such as image based rendering and digital libraries. Many important algorithms broken down and illustrated in pseudo code. Appropriate for use by engineers as a comprehensive reference to the computer vision enterprise.
3,627 citations
07 Jun 2015
TL;DR: A new place recognition approach is developed that combines an efficient synthesis of novel views with a compact indexable image representation and significantly outperforms other large-scale place recognition techniques on this challenging data.
Abstract: We address the problem of large-scale visual place recognition for situations where the scene undergoes a major change in appearance, for example, due to illumination (day/night), change of seasons, aging, or structural modifications over time such as buildings built or destroyed. Such situations represent a major challenge for current large-scale place recognition methods. This work has the following three principal contributions. First, we demonstrate that matching across large changes in the scene appearance becomes much easier when both the query image and the database image depict the scene from approximately the same viewpoint. Second, based on this observation, we develop a new place recognition approach that combines (i) an efficient synthesis of novel views with (ii) a compact indexable image representation. Third, we introduce a new challenging dataset of 1,125 camera-phone query images of Tokyo that contain major changes in illumination (day, sunset, night) as well as structural changes in the scene. We demonstrate that the proposed approach significantly outperforms other large-scale place recognition techniques on this challenging data.
502 citations
27 Jul 2014
TL;DR: This paper introduces Intrinsic Images in the Wild, a large-scale, public dataset for evaluating intrinsic image decompositions of indoor scenes, and develops a dense CRF-based intrinsic image algorithm for images in the wild that outperforms a range of state-of-the-art intrinsic image algorithms.
Abstract: Intrinsic image decomposition separates an image into a reflectance layer and a shading layer. Automatic intrinsic image decomposition remains a significant challenge, particularly for real-world scenes. Advances on this longstanding problem have been spurred by public datasets of ground truth data, such as the MIT Intrinsic Images dataset. However, the difficulty of acquiring ground truth data has meant that such datasets cover a small range of materials and objects. In contrast, real-world scenes contain a rich range of shapes and materials, lit by complex illumination. In this paper we introduce Intrinsic Images in the Wild, a large-scale, public dataset for evaluating intrinsic image decompositions of indoor scenes. We create this benchmark through millions of crowdsourced annotations of relative comparisons of material properties at pairs of points in each scene. Crowdsourcing enables a scalable approach to acquiring a large database, and uses the ability of humans to judge material comparisons, despite variations in illumination. Given our database, we develop a dense CRF-based intrinsic image algorithm for images in the wild that outperforms a range of state-of-the-art intrinsic image algorithms. Intrinsic image decomposition remains a challenging problem; we release our code and database publicly to support future research on this problem, available online at http://intrinsic.cs.cornell.edu/.
427 citations
TL;DR: A novel two-phase method combining CNN transfer learning and web data augmentation that can assist the popular deep CNNs to achieve better performance, and particularly, ResNet can outperform all the state-of-the-art models on six small datasets.
Abstract: Since Convolutional Neural Network (CNN) won the image classification competition 202 (ILSVRC12), a lot of attention has been paid to deep layer CNN study. The success of CNN is attributed to its superior multi-scale high-level image representations as opposed to hand-engineering low-level features. However, estimating millions of parameters of a deep CNN requires a large number of annotated samples, which currently prevents many superior deep CNNs (such as AlexNet, VGG, ResNet) being applied to problems with limited training data. To address this problem, a novel two-phase method combining CNN transfer learning and web data augmentation is proposed. With our method, the useful feature presentation of pre-trained network can be efficiently transferred to target task, and the original dataset can be augmented with the most valuable Internet images for classification. Our method not only greatly reduces the requirement of a large training data, but also effectively expand the training dataset. Both of method features contribute to the considerable over-fitting reduction of deep CNNs on small dataset. In addition, we successfully apply Bayesian optimization to solve the tuff problem, hyper-parameter tuning, in network fine-tuning. Our solution is applied to six public small datasets. Extensive experiments show that, comparing to traditional methods, our solution can assist the popular deep CNNs to achieve better performance. Particularly, ResNet can outperform all the state-of-the-art models on six small datasets. The experiment results prove that the proposed solution will be the great tool for dealing with practice problems which are related to use deep CNNs on small dataset.
376 citations