Papers
More filters
Cited by
More filters
••
13 Jun 2010TL;DR: A new representation for food items is proposed that calculates pairwise statistics between local features computed over a soft pixel-level segmentation of the image into eight ingredient types and is significantly more accurate at identifying food than existing methods.
Abstract: Food recognition is difficult because food items are de-formable objects that exhibit significant variations in appearance. We believe the key to recognizing food is to exploit the spatial relationships between different ingredients (such as meat and bread in a sandwich). We propose a new representation for food items that calculates pairwise statistics between local features computed over a soft pixellevel segmentation of the image into eight ingredient types. We accumulate these statistics in a multi-dimensional histogram, which is then used as a feature vector for a discriminative classifier. Our experiments show that the proposed representation is significantly more accurate at identifying food than existing methods.
263 citations
••
07 Jun 2015
TL;DR: This paper proposes to regularize over larger distances using object-category specific disparity proposals (displets) which are sample using inverse graphics techniques based on a sparse disparity estimate and a semantic segmentation of the image.
Abstract: Stereo techniques have witnessed tremendous progress over the last decades, yet some aspects of the problem still remain challenging today. Striking examples are reflecting and textureless surfaces which cannot easily be recovered using traditional local regularizers. In this paper, we therefore propose to regularize over larger distances using object-category specific disparity proposals (displets) which we sample using inverse graphics techniques based on a sparse disparity estimate and a semantic segmentation of the image. The proposed displets encode the fact that objects of certain categories are not arbitrarily shaped but typically exhibit regular structures. We integrate them as non-local regularizer for the challenging object class ‘car’ into a superpixel based CRF framework and demonstrate its benefits on the KITTI stereo evaluation. At time of submission, our approach ranks first across all KITTI stereo leaderboards.
242 citations
•
03 Dec 2012TL;DR: This work addresses the problem of generating multiple hypotheses for structured prediction tasks that involve interaction with users or successive components in a cascaded architecture by formulating this task as a multiple-output structured-output prediction problem with a loss-function that effectively captures the setup of the problem.
Abstract: We address the problem of generating multiple hypotheses for structured prediction tasks that involve interaction with users or successive components in a cascaded architecture. Given a set of multiple hypotheses, such components/users typically have the ability to retrieve the best (or approximately the best) solution in this set. The standard approach for handling such a scenario is to first learn a single-output model and then produce M-Best Maximum a Posteriori (MAP) hypotheses from this model. In contrast, we learn to produce multiple outputs by formulating this task as a multiple-output structured-output prediction problem with a loss-function that effectively captures the setup of the problem. We present a max-margin formulation that minimizes an upper-bound on this loss-function. Experimental results on image segmentation and protein side-chain prediction show that our method outperforms conventional approaches used for this type of scenario and leads to substantial improvements in prediction accuracy.
203 citations