Author
Ching L. Teo
Other affiliations: DSO National Laboratories
Bio: Ching L. Teo is an academic researcher from University of Maryland, College Park. The author has contributed to research in topics: Language model & Cognitive neuroscience of visual object recognition. The author has an hindex of 9, co-authored 16 publications receiving 409 citations. Previous affiliations of Ching L. Teo include DSO National Laboratories.
Papers
More filters
26 May 2015
TL;DR: This work proposes two approaches for learning affordances from local shape and geometry primitives: superpixel based hierarchical matching pursuit (S-HMP); and structured random forests (SRF), and introduces a large RGB-Depth dataset where tool parts are labeled with multiple affordances and their relative rankings.
Abstract: As robots begin to collaborate with humans in everyday workspaces, they will need to understand the functions of tools and their parts. To cut an apple or hammer a nail, robots need to not just know the tool's name, but they must localize the tool's parts and identify their functions. Intuitively, the geometry of a part is closely related to its possible functions, or its affordances. Therefore, we propose two approaches for learning affordances from local shape and geometry primitives: 1) superpixel based hierarchical matching pursuit (S-HMP); and 2) structured random forests (SRF). Moreover, since a part can be used in many ways, we introduce a large RGB-Depth dataset where tool parts are labeled with multiple affordances and their relative rankings. With ranked affordances, we evaluate the proposed methods on 3 cluttered scenes and over 105 kitchen, workshop and garden tools, using ranked correlation and a weighted F-measure score [26]. Experimental results over sequences containing clutter, occlusions, and viewpoint changes show that the approaches return precise predictions that could be used by a robot. S-HMP achieves high accuracy but at a significant computational cost, while SRF provides slightly less accurate predictions but in real-time. Finally, we validate the effectiveness of our approaches on the Cornell Grasping Dataset [25] for detecting graspable regions, and achieve state-of-the-art performance.
235 citations
24 Dec 2012
TL;DR: A system to automatically build a tree structure from observations of an actor performing complex manipulation activities, providing a robust structure for complex activity recognition over real data and disambiguating interleaved activities from within the same sequence is built.
Abstract: There is good reason to believe that humans use some kind of recursive grammatical structure when we recognize and perform complex manipulation activities. We have built a system to automatically build a tree structure from observations of an actor performing such activities. The activity trees that result form a framework for search and understanding, tying action to language. We explore and evaluate the system by performing experiments over a novel complex activity dataset taken using synchronized Kinect and SR4000 Time of Flight cameras. Processing of the combined 3D and 2D image data provides the necessary terminals and events to build the tree from the bottom-up. Experimental results highlight the contribution of the action grammar in: 1) providing a robust structure for complex activity recognition over real data and 2) disambiguating interleaved activities from within the same sequence.
50 citations
07 Dec 2015
TL;DR: This work proposes a complete approach that links the detection of curved reflection symmetries to produce symmetry-constrained segments of structures/regions in real images with clutter to enforce global symmetrical consistency in the final segmentation.
Abstract: Symmetry, as one of the key components of Gestalt theory, provides an important mid-level cue that serves as input to higher visual processes such as segmentation. In this work, we propose a complete approach that links the detection of curved reflection symmetries to produce symmetry-constrained segments of structures/regions in real images with clutter. For curved reflection symmetry detection, we leverage on patch-based symmetric features to train a Structured Random Forest classifier that detects multiscaled curved symmetries in 2D images. Next, using these curved symmetries, we modulate a novel symmetry-constrained foreground-background segmentation by their symmetry scores so that we enforce global symmetrical consistency in the final segmentation. This is achieved by imposing a pairwise symmetry prior that encourages symmetric pixels to have the same labels over a MRF-based representation of the input image edges, and the final segmentation is obtained via graph-cuts. Experimental results over four publicly available datasets containing annotated symmetric structures: 1) SYMMAX-300 [38], 2) BSD-Parts, 3) Weizmann Horse (both from [18]) and 4) NY-roads [35] demonstrate the approach's applicability to different environments with state-of-the-art performance.
43 citations
06 Nov 2011
TL;DR: A novel approach to utilizing high level knowledge for the problem of scene recognition in an active vision framework, which is called active scene recognition, by implementing an interaction between a reasoning module and a sensory module.
Abstract: This paper presents a novel approach to utilizing high level knowledge for the problem of scene recognition in an active vision framework, which we call active scene recognition. In traditional approaches, high level knowledge is used in the post-processing to combine the outputs of the object detectors to achieve better classification performance. In contrast, the proposed approach employs high level knowledge actively by implementing an interaction between a reasoning module and a sensory module (Figure 1). Following this paradigm, we implemented an active scene recognizer and evaluated it with a dataset of 20 scenes and 100+ objects. We also extended it to the analysis of dynamic scenes for activity recognition with attributes. Experiments demonstrate the effectiveness of the active paradigm in introducing attention and additional constraints into the sensing process.
36 citations
07 Jun 2015
TL;DR: A novel border ownership structure is imposed that detects both boundaries and border ownership at the same time and exceeds current state-of-the-art multi-stage approaches that use more complex features.
Abstract: A method for efficient border ownership assignment in 2D images is proposed. Leveraging on recent advances using Structured Random Forests (SRF) for boundary detection [8], we impose a novel border ownership structure that detects both boundaries and border ownership at the same time. Key to this work are features that predict ownership cues from 2D images. To this end, we use several different local cues: shape, spectral properties of boundary patches, and semi-global grouping cues that are indicative of perceived depth. For shape, we use HoG-like descriptors that encode local curvature (convexity and concavity). For spectral properties, such as extremal edges [28], we first learn an orthonormal basis spanned by the top K eigenvectors via PCA over common types of contour tokens [23]. For grouping, we introduce a novel mid-level descriptor that captures patterns near edges and indicates ownership information of the boundary. Experimental results over a subset of the Berkeley Segmentation Dataset (BSDS) [24] and the NYU Depth V2 [34] dataset show that our method's performance exceeds current state-of-the-art multi-stage approaches that use more complex features.
32 citations
Cited by
More filters
TL;DR: By the proposed approach, a deep architecture could be designed to learn the high-level features for scene recognition in an unsupervised fashion, and Experiments on standard data sets show that the method outperforms the state-of-the-art used forscene recognition.
Abstract: Scene recognition is an important problem in the field of computer vision, because it helps to narrow the gap between the computer and the human beings on scene understanding Semantic modeling is a popular technique used to fill the semantic gap in scene recognition However, most of the semantic modeling approaches learn shallow, one-layer representations for scene recognition, while ignoring the structural information related between images, often resulting in poor performance Modeled after our own human visual system, as it is intended to inherit humanlike judgment, a manifold regularized deep architecture is proposed for scene recognition The proposed deep architecture exploits the structural information of the data, making for a mapping between visible layer and hidden layer By the proposed approach, a deep architecture could be designed to learn the high-level features for scene recognition in an unsupervised fashion Experiments on standard data sets show that our method outperforms the state-of-the-art used for scene recognition
203 citations
Proceedings Article•
25 Jan 2015
TL;DR: A system that learns manipulation action plans by processing unconstrained videos from the World Wide Web to robustly generate the sequence of atomic actions of seen longer actions in video in order to acquire knowledge for robots.
Abstract: In order to advance action generation and creation in robots beyond simple learned schemas we need computational tools that allow us to automatically interpret and represent human actions. This paper presents a system that learns manipulation action plans by processing unconstrained videos from the World Wide Web. Its goal is to robustly generate the sequence of atomic actions of seen longer actions in video in order to acquire knowledge for robots. The lower level of the system consists of two convolutional neural network (CNN) based recognition modules, one for classifying the hand grasp type and the other for object recognition. The higher level is a probabilistic manipulation action grammar based parsing module that aims at generating visual sentences for robot manipulation. Experiments conducted on a publicly available unconstrained video dataset show that the system is able to learn manipulation actions by "watching" unconstrained videos with high accuracy.
202 citations
TL;DR: In this article, the authors present a history of active perception in robotics, artificial intelligence and computer vision, highlighting the seminal contributions and argue that those contributions are as relevant today as they were decades ago and, with the state of modern computational tools, are poised to find new life in robotic perception systems of the next decade.
Abstract: Despite the recent successes in robotics, artificial intelligence and computer vision, a complete artificial agent necessarily must include active perception. A multitude of ideas and methods for how to accomplish this have already appeared in the past, their broader utility perhaps impeded by insufficient computational power or costly hardware. The history of these ideas, perhaps selective due to our perspectives, is presented with the goal of organizing the past literature and highlighting the seminal contributions. We argue that those contributions are as relevant today as they were decades ago and, with the state of modern computational tools, are poised to find new life in the robotic perception systems of the next decade.
180 citations