Author
Cornelia Fermüller
Bio: Cornelia Fermüller is an academic researcher from University of Maryland, College Park. The author has contributed to research in topics: Motion estimation & Motion field. The author has an hindex of 38, co-authored 211 publications receiving 4589 citations.
Papers published on a yearly basis
Papers
More filters
TL;DR: The multifractal spectrum (MFS) is introduced, a new texture signature that is invariant under the bi-Lipschitz map, which includes view-point changes and non-rigid deformations of the texture surface, as well as local affine illumination changes.
Abstract: Image texture provides a rich visual description of the surfaces in the scene. Many texture signatures based on various statistical descriptions and various local measurements have been developed. Existing signatures, in general, are not invariant to 3D geometric transformations, which is a serious limitation for many applications. In this paper we introduce a new texture signature, called the multifractal spectrum (MFS). The MFS is invariant under the bi-Lipschitz map, which includes view-point changes and non-rigid deformations of the texture surface, as well as local affine illumination changes. It provides an efficient framework combining global spatial invariance and local robust measurements. Intuitively, the MFS could be viewed as a "better histogram" with greater robustness to various environmental changes and the advantage of capturing some geometrical distribution information encoded in the texture. Experiments demonstrate that the MFS codes the essential structure of textures with very low dimension, and thus represents an useful tool for texture classification.
300 citations
26 May 2015
TL;DR: This work proposes two approaches for learning affordances from local shape and geometry primitives: superpixel based hierarchical matching pursuit (S-HMP); and structured random forests (SRF), and introduces a large RGB-Depth dataset where tool parts are labeled with multiple affordances and their relative rankings.
Abstract: As robots begin to collaborate with humans in everyday workspaces, they will need to understand the functions of tools and their parts. To cut an apple or hammer a nail, robots need to not just know the tool's name, but they must localize the tool's parts and identify their functions. Intuitively, the geometry of a part is closely related to its possible functions, or its affordances. Therefore, we propose two approaches for learning affordances from local shape and geometry primitives: 1) superpixel based hierarchical matching pursuit (S-HMP); and 2) structured random forests (SRF). Moreover, since a part can be used in many ways, we introduce a large RGB-Depth dataset where tool parts are labeled with multiple affordances and their relative rankings. With ranked affordances, we evaluate the proposed methods on 3 cluttered scenes and over 105 kitchen, workshop and garden tools, using ranked correlation and a weighted F-measure score [26]. Experimental results over sequences containing clutter, occlusions, and viewpoint changes show that the approaches return precise predictions that could be used by a robot. S-HMP achieves high accuracy but at a significant computational cost, while SRF provides slightly less accurate predictions but in real-time. Finally, we validate the effectiveness of our approaches on the Cornell Grasping Dataset [25] for detecting graspable regions, and achieve state-of-the-art performance.
235 citations
Proceedings Article•
25 Jan 2015
TL;DR: A system that learns manipulation action plans by processing unconstrained videos from the World Wide Web to robustly generate the sequence of atomic actions of seen longer actions in video in order to acquire knowledge for robots.
Abstract: In order to advance action generation and creation in robots beyond simple learned schemas we need computational tools that allow us to automatically interpret and represent human actions. This paper presents a system that learns manipulation action plans by processing unconstrained videos from the World Wide Web. Its goal is to robustly generate the sequence of atomic actions of seen longer actions in video in order to acquire knowledge for robots. The lower level of the system consists of two convolutional neural network (CNN) based recognition modules, one for classifying the hand grasp type and the other for object recognition. The higher level is a probabilistic manipulation action grammar based parsing module that aims at generating visual sentences for robot manipulation. Experiments conducted on a publicly available unconstrained video dataset show that the system is able to learn manipulation actions by "watching" unconstrained videos with high accuracy.
202 citations
[...]
TL;DR: The suggested approach for understanding behavioural vision to realize the relationships of perception and action builds on two earlier approaches, the Medusa philosophy and the Synthetic approach, and calls for synthesizing an artificial vision system by studying vision competences of increasing complexity and pursuing the integration of the perceptual components with action and learning modules.
Abstract: Our work on active vision has recently focused on the computational modelling of navigational tasks, where our investigations were guided by the idea of approaching vision for behavioural systems in the form of modules that are directly related to perceptual tasks. These studies led us to branch in various directions and inquire into the problems that have to be addressed in order to obtain an overall understanding of perceptual systems. In this paper, we present our views about the architecture of vision systems, about how to tackle the design and analysis of perceptual systems, and promising future research directions. Our suggested approach for understanding behavioural vision to realize the relationships of perception and action builds on two earlier approaches, the Medusa philosophy1 and the Synthetic approach2. The resulting framework calls for synthesizing an artificial vision system by studying vision competences of increasing complexity and, at the same time, pursuing the integration of the perceptual components with action and learning modules. We expect that computer vision research in the future will progress in tight collaboration with many other disciplines that are concerned with empirical approaches to vision, i.e. the understanding of biological vision. Throughout the paper, we describe biological findings that motivate computational arguments which we believe will influence studies of computer vision in the near future.
182 citations
01 Oct 2018
TL;DR: This paper presents a novel event stream representation which enables us to utilize information about the dynamic (temporal)component of the event stream, and demonstrates the framework on the task of independent motion detection and tracking, where it is used to locate differently moving objects in challenging situations of very fast motion.
Abstract: Event-based vision sensors, such as the Dynamic Vision Sensor (DVS), are ideally suited for real-time motion analysis. The unique properties encompassed in the readings of such sensors provide high temporal resolution, superior sensitivity to light and low latency. These properties provide the grounds to estimate motion efficiently and reliably in the most sophisticated scenarios, but these advantages come at a price - modern event-based vision sensors have extremely low resolution, produce a lot of noise and require the development of novel algorithms to handle the asynchronous event stream. This paper presents a new, efficient approach to object tracking with asynchronous cameras. We present a novel event stream representation which enables us to utilize information about the dynamic (temporal)component of the event stream. The 3D geometry of the event stream is approximated with a parametric model to motion-compensate for the camera (without feature tracking or explicit optical flow computation), and then moving objects that don't conform to the model are detected in an iterative process. We demonstrate our framework on the task of independent motion detection and tracking, where we use the temporal model inconsistencies to locate differently moving objects in challenging situations of very fast motion.
168 citations
Cited by
More filters
Journal Article•
28,685 citations
01 Jan 2004
TL;DR: Comprehensive and up-to-date, this book includes essential topics that either reflect practical significance or are of theoretical importance and describes numerous important application areas such as image based rendering and digital libraries.
Abstract: From the Publisher:
The accessible presentation of this book gives both a general view of the entire computer vision enterprise and also offers sufficient detail to be able to build useful applications. Users learn techniques that have proven to be useful by first-hand experience and a wide range of mathematical methods. A CD-ROM with every copy of the text contains source code for programming practice, color images, and illustrative movies. Comprehensive and up-to-date, this book includes essential topics that either reflect practical significance or are of theoretical importance. Topics are discussed in substantial and increasing depth. Application surveys describe numerous important application areas such as image based rendering and digital libraries. Many important algorithms broken down and illustrated in pseudo code. Appropriate for use by engineers as a comprehensive reference to the computer vision enterprise.
3,627 citations
3,248 citations