Showing papers by "Takeo Kanade published in 2016"

PDF

Open Access

Proceedings Article•DOI•

[...]

Shih-En Wei¹, Varun Ramakrishna¹, Takeo Kanade¹, Yaser Sheikh¹•Institutions (1)

30 Jan 2016

TL;DR: In this paper, a convolutional network is incorporated into the pose machine framework for learning image features and image-dependent spatial models for the task of pose estimation, which can implicitly model long-range dependencies between variables in structured prediction tasks such as articulated pose estimation.

...read moreread less

Abstract: Pose Machines provide a sequential prediction framework for learning rich implicit spatial models. In this work we show a systematic design for how convolutional networks can be incorporated into the pose machine framework for learning image features and image-dependent spatial models for the task of pose estimation. The contribution of this paper is to implicitly model long-range dependencies between variables in structured prediction tasks such as articulated pose estimation. We achieve this by designing a sequential architecture composed of convolutional networks that directly operate on belief maps from previous stages, producing increasingly refined estimates for part locations, without the need for explicit graphical model-style inference. Our approach addresses the characteristic difficulty of vanishing gradients during training by providing a natural learning objective function that enforces intermediate supervision, thereby replenishing back-propagated gradients and conditioning the learning procedure. We demonstrate state-of-the-art performance and outperform competing methods on standard benchmarks including the MPII, LSP, and FLIC datasets.

...read moreread less

2,687 citations

Posted Content•

Convolutional Pose Machines

[...]

Shih-En Wei¹, Varun Ramakrishna¹, Takeo Kanade¹, Yaser Sheikh¹•Institutions (1)

Carnegie Mellon University¹

30 Jan 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work designs a sequential architecture composed of convolutional networks that directly operate on belief maps from previous stages, producing increasingly refined estimates for part locations, without the need for explicit graphical model-style inference in structured prediction tasks such as articulated pose estimation.

...read moreread less

317 citations

Posted Content•

How useful is photo-realistic rendering for visual learning?

[...]

Yair Movshovitz-Attias¹, Takeo Kanade¹, Yaser Sheikh¹•Institutions (1)

Carnegie Mellon University¹

26 Mar 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: Data seems cheap to get, and in many ways it is, but the process of creating a high quality labeled dataset from a mass of data is time-consuming and expensive.

...read moreread less

Abstract: Data seems cheap to get, and in many ways it is, but the process of creating a high quality labeled dataset from a mass of data is time-consuming and expensive. With the advent of rich 3D repositories, photo-realistic rendering systems offer the opportunity to provide nearly limitless data. Yet, their primary value for visual learning may be the quality of the data they can provide rather than the quantity. Rendering engines offer the promise of perfect labels in addition to the data: what the precise camera pose is; what the precise lighting location, temperature, and distribution is; what the geometry of the object is. In this work we focus on semi-automating dataset creation through use of synthetic data and apply this method to an important task -- object viewpoint estimation. Using state-of-the-art rendering software we generate a large labeled dataset of cars rendered densely in viewpoint space. We investigate the effect of rendering parameters on estimation performance and show realism is important. We show that generalizing from synthetic data is not harder than the domain adaptation required between two real-image datasets and that combining synthetic images with a small amount of real data improves estimation accuracy.

...read moreread less

103 citations

Book Chapter•DOI•

How Useful Is Photo-Realistic Rendering for Visual Learning?

[...]

Yair Movshovitz-Attias¹, Takeo Kanade¹, Yaser Sheikh¹•Institutions (1)

Carnegie Mellon University¹

08 Oct 2016

TL;DR: In many ways, data seems cheap to get, and in many ways it is, but the process of creating a high quality labeled dataset from a mass of data is time-consuming and expensive as mentioned in this paper.

...read moreread less

Abstract: Data seems cheap to get, and in many ways it is, but the process of creating a high quality labeled dataset from a mass of data is time-consuming and expensive.

...read moreread less

62 citations

Journal Article•DOI•

Interactive Cell Segmentation Based on Active and Semi-Supervised Learning

[...]

Hang Su¹, Zhaozheng Yin², Seungil Huh³, Takeo Kanade¹, Jun Zhu⁴ - Show less +1 more•Institutions (4)

Carnegie Mellon University¹, Missouri University of Science and Technology², Google³, Tsinghua University⁴

01 Mar 2016-IEEE Transactions on Medical Imaging

TL;DR: Experimental results performed on three types of cell populations validate that the interactive cell segmentation algorithm quickly reaches high quality results with minimal human interventions and is significantly more efficient than alternative methods, since the most informative samples are selected for human annotation/verification early.

...read moreread less

Abstract: Automatic cell segmentation can hardly be flawless due to the complexity of image data particularly when time-lapse experiments last for a long time without biomarkers. To address this issue, we propose an interactive cell segmentation method by classifying feature-homogeneous superpixels into specific classes, which is guided by human interventions. Specifically, we propose to actively select the most informative superpixels by minimizing the expected prediction error which is upper bounded by the transductive Rademacher complexity, and then query for human annotations. After propagating the user-specified labels to the remaining unlabeled superpixels via an affinity graph, the error-prone superpixels are selected automatically and request for human verification on them; once erroneous segmentation is detected and subsequently corrected, the information is propagated efficiently over a gradually-augmented graph to un-labeled superpixels such that the analogous errors are fixed meanwhile. The correction propagation step is efficiently conducted by introducing a verification propagation matrix rather than rebuilding the affinity graph and re-performing the label propagation from the beginning. We repeat this procedure until most superpixels are classified into a specific category with high confidence. Experimental results performed on three types of cell populations validate that our interactive cell segmentation algorithm quickly reaches high quality results with minimal human interventions and is significantly more efficient than alternative methods, since the most informative samples are selected for human annotation/verification early.

...read moreread less

51 citations

Patent•

Heat map of vehicle damage

[...]

Chen Ke, Haller John L, Takeo Kanade, Georghiades Athinodoros S

20 May 2016

TL;DR: In this paper, an image processing system and/or method obtains source images in which a damaged vehicle is represented, and performs image processing techniques to determine, predict, estimate, and or detect damage that has occurred at various locations on the vehicle.

...read moreread less

Abstract: An image processing system and/or method obtains source images in which a damaged vehicle is represented, and performs image processing techniques to determine, predict, estimate, and/or detect damage that has occurred at various locations on the vehicle. The image processing techniques may include generating a composite image of the damaged vehicle, aligning and/or isolating the image, applying convolutional neural network techniques to the image to generate damage parameter values, where each value corresponds to damage in a particular location of vehicle, and/or other techniques. Based on the damage values, the image processing system/method generates and displays a heat map for the vehicle, where each color and/or color gradation corresponds to respective damage at a respective location on the vehicle. The heat map may be manipulatable by the user, and may include user controls for displaying additional information corresponding to the damage at a particular location on the vehicle.

...read moreread less

36 citations

Posted Content•

Panoptic Studio: A Massively Multiview System for Social Interaction Capture

[...]

Hanbyul Joo¹, Tomas Simon¹, Xulong Li², Hao Liu³, Lei Tan⁴, Lin Gui³, Sean Banerjee⁵, Timothy Godisart, B. Nabbe¹, Iain Matthews⁶, Takeo Kanade¹, Shohei Nobuhara⁷, Yaser Sheikh¹ - Show less +9 more•Institutions (7)

Carnegie Mellon University¹, Beijing University of Posts and Telecommunications², Ocean University of China³, Hunan University⁴, Clarkson University⁵, Disney Research⁶, Kyoto University⁷

09 Dec 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this article, the Panoptic Studio is used to capture the 3D motion of a group of people engaged in a social interaction, and a modularized system consisting of integrated structural, hardware, and software innovations is presented.

...read moreread less

Abstract: We present an approach to capture the 3D motion of a group of people engaged in a social interaction. The core challenges in capturing social interactions are: (1) occlusion is functional and frequent; (2) subtle motion needs to be measured over a space large enough to host a social group; (3) human appearance and configuration variation is immense; and (4) attaching markers to the body may prime the nature of interactions. The Panoptic Studio is a system organized around the thesis that social interactions should be measured through the integration of perceptual analyses over a large variety of view points. We present a modularized system designed around this principle, consisting of integrated structural, hardware, and software innovations. The system takes, as input, 480 synchronized video streams of multiple people engaged in social activities, and produces, as output, the labeled time-varying 3D structure of anatomical landmarks on individuals in the space. Our algorithm is designed to fuse the "weak" perceptual processes in the large number of views by progressively generating skeletal proposals from low-level appearance cues, and a framework for temporal refinement is also presented by associating body parts to reconstructed dense 3D trajectory stream. Our system and method are the first in reconstructing full body motion of more than five people engaged in social interactions without using markers. We also empirically demonstrate the impact of the number of views in achieving this goal.

...read moreread less

17 citations

Posted Content•

Visual Compiler: Synthesizing a Scene-Specific Pedestrian Detector and Pose Estimator

[...]

Namhoon Lee, Xinshuo Weng, Vishnu Naresh Boddeti, Yu Zhang, Fares Beainy, Kris M. Kitani, Takeo Kanade - Show less +3 more

15 Dec 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work introduces the concept of a Visual Compiler that generates a scene specific pedestrian detector and pose estimator without any pedestrian observations, and demonstrates that when real human annotated data is scarce or non-existent, this data generation strategy can provide an excellent solution for bootstrapping human detection and pose estimation.

...read moreread less

Abstract: We introduce the concept of a Visual Compiler that generates a scene specific pedestrian detector and pose estimator without any pedestrian observations. Given a single image and auxiliary scene information in the form of camera parameters and geometric layout of the scene, the Visual Compiler first infers geometrically and photometrically accurate images of humans in that scene through the use of computer graphics rendering. Using these renders we learn a scene-and-region specific spatially-varying fully convolutional neural network, for simultaneous detection, pose estimation and segmentation of pedestrians. We demonstrate that when real human annotated data is scarce or non-existent, our data generation strategy can provide an excellent solution for bootstrapping human detection and pose estimation. Experimental results show that our approach outperforms off-the-shelf state-of-the-art pedestrian detectors and pose estimators that are trained on real data.

...read moreread less

13 citations

Journal Article•DOI•

Special issue on Assistive Computer Vision and Robotics - Part I

[...]

Giovanni Maria Farinella¹, Takeo Kanade², Marco Leo³, Gerard Medioni⁴, Mohan M. Trivedi⁵ - Show less +1 more•Institutions (5)

University of Catania¹, Carnegie Mellon University², National Research Council³, University of Southern California⁴, University of California, Berkeley⁵

01 Jul 2016-Computer Vision and Image Understanding

TL;DR: This CVIU special issue gathers very recent and various works n assistive computer vision and robotics that have applications in robotics as multi-modal human-robot interaction, autonomous navigaion, object usage, place recognition, robotic manipulator, egocenric vision.

...read moreread less

8 citations

Journal Article•DOI•

Online Approximate Model Representation Based on Scale-Normalized and Fronto-Parallel Appearance

[...]

Kiho Kwak¹, Jun-Sik Kim², Daniel Huber³, Takeo Kanade³•Institutions (3)

Agency for Defense Development¹, Korea Institute of Science and Technology², Carnegie Mellon University³

01 Mar 2016-International Journal of Computer Vision

TL;DR: The proposed object representation consists of an approximated geometry model and a viewpoint-scale invariant appearance model which makes it possible to model a new object online, and provides a robustness to viewpoint variation and occlusion.

...read moreread less

Abstract: Various object representations have been widely used for many tasks such as object detection, recognition, and tracking. Most of them requires an intensive training process on large database which is collected in advance, and it is hard to add models of a previously unobserved object which is not in the database. In this paper, we investigate how to create a representation of a new and unknown object online, and how to apply it to practical applications like object detection and tracking. To make it viable, we utilize a sensor fusion approach using a camera and a single-line scan LIDAR. The proposed representation consists of an approximated geometry model and a viewpoint-scale invariant appearance model which makes to extremely simple to match the model and the observation. This property makes it possible to model a new object online, and provides a robustness to viewpoint variation and occlusion. The representation has benefits of both an implicit model (referred to as a view-based model) and an explicit model (referred to as a shape-based model). Intensive experiments using synthetic and real data demonstrate the viability of the proposed object representation in both modeling and detecting/tracking objects.

...read moreread less

2 citations

Book Chapter•DOI•

Detect Cells and Cellular Behaviors in Phase Contrast Microscopy Images

[...]

Mei Chen¹, Takeo Kanade²•Institutions (2)

State University of New York System¹, Carnegie Mellon University²

01 Jan 2016

TL;DR: This chapter focuses on image analysis and understanding of live cell populations in time lapse phase contrast microscopy using state-of-the-art algorithms for cell segmentation and cell behavior understanding.

...read moreread less

Abstract: This chapter focuses on image analysis and understanding of live cell populations in time lapse phase contrast microscopy. The computer vision tasks involve cell segmentation and cell behavior understanding, including cell migration, division (mitosis), death (apoptosis), and differentiation. We will describe the problem definition for each topic, introduce the general schools of approaches that have been explored, discuss details of the state-of-the-art algorithms, and propose promising directions for future investigation.

...read moreread less

Book Chapter•DOI•

Continuous Supervised Descent Method for Facial Landmark Localisation

[...]

Marc Oliu¹, Ciprian A. Corneanu², László A. Jeni³, Jeffrey F. Cohn³, Jeffrey F. Cohn⁴, Takeo Kanade³, Sergio Escalera² - Show less +3 more•Institutions (4)

Open University of Catalonia¹, University of Barcelona², Carnegie Mellon University³, University of Pittsburgh⁴

20 Nov 2016

TL;DR: This work proposes a second order linear regression method that is both compact and robust against strong rotations, and provides a closed form solution, making the method fast to train.

...read moreread less

Abstract: Recent methods for facial landmark location perform well on close-to-frontal faces but have problems in generalising to large head rotations. In order to address this issue we propose a second order linear regression method that is both compact and robust against strong rotations. We provide a closed form solution, making the method fast to train. We test the method’s performance on two challenging datasets. The first has been intensely used by the community. The second has been specially generated from a well known 3D face dataset. It is considerably more challenging, including a high diversity of rotations and more samples than any other existing public dataset. The proposed method is compared against state-of-the-art approaches, including RCPR, CGPRT, LBF, CFSS, and GSDM. Results upon both datasets show that the proposed method offers state-of-the-art performance on near frontal view data, improves state-of-the-art methods on more challenging head rotation problems and keeps a compact model size.

...read moreread less

Journal Article•DOI•

Special Issue on Assistive Computer Vision and Robotics - ``Assistive Solutions for Mobility, Communication and HMI''

[...]

Giovanni Maria Farinella¹, Takeo Kanade², Marco Leo, Gerard Medioni³, Mohan M. Trivedi⁴ - Show less +1 more•Institutions (4)

University of Catania¹, Carnegie Mellon University², University of Southern California³, University of California, Berkeley⁴

01 Aug 2016-Computer Vision and Image Understanding

...read moreread less