Showing papers on "Video tracking published in 2013"

PDF

Open Access

Proceedings Article•DOI•

[...]

Yi Wu¹, Jongwoo Lim², Ming-Hsuan Yang¹•Institutions (2)

University of California, Merced¹, Hanyang University²

23 Jun 2013

TL;DR: Large scale experiments are carried out with various evaluation criteria to identify effective approaches for robust tracking and provide potential future research directions in this field.

...read moreread less

Abstract: Object tracking is one of the most important components in numerous applications of computer vision. While much progress has been made in recent years with efforts on sharing code and datasets, it is of great importance to develop a library and benchmark to gauge the state of the art. After briefly reviewing recent advances of online object tracking, we carry out large scale experiments with various evaluation criteria to understand how these algorithms perform. The test image sequences are annotated with different attributes for performance evaluation and analysis. By analyzing quantitative results, we identify effective approaches for robust tracking and provide potential future research directions in this field.

...read moreread less

3,828 citations

Journal Article•DOI•

Enhanced Computer Vision With Microsoft Kinect Sensor: A Review

[...]

Jungong Han, Ling Shao¹, Dong Xu², Jamie Shotton³•Institutions (3)

Nanjing University of Information Science and Technology¹, Nanyang Technological University², Microsoft³

25 Jun 2013-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: A comprehensive review of recent Kinect-based computer vision algorithms and applications covering topics including preprocessing, object tracking and recognition, human activity analysis, hand gesture analysis, and indoor 3-D mapping.

...read moreread less

Abstract: With the invention of the low-cost Microsoft Kinect sensor, high-resolution depth and visual (RGB) sensing has become available for widespread use. The complementary nature of the depth and visual information provided by the Kinect sensor opens up new opportunities to solve fundamental problems in computer vision. This paper presents a comprehensive review of recent Kinect-based computer vision algorithms and applications. The reviewed approaches are classified according to the type of vision problems that can be addressed or enhanced by means of the Kinect sensor. The covered topics include preprocessing, object tracking and recognition, human activity analysis, hand gesture analysis, and indoor 3-D mapping. For each category of methods, we outline their main algorithmic contributions and summarize their advantages/differences compared to their RGB counterparts. Finally, we give an overview of the challenges in this field and future research trends. This paper is expected to serve as a tutorial and source of references for Kinect-based computer vision researchers.

...read moreread less

1,513 citations

Proceedings Article•DOI•

SLAM++: Simultaneous Localisation and Mapping at the Level of Objects

[...]

Renato F. Salas-Moreno¹, Richard Newcombe², Hauke Strasdat¹, Paul H. J. Kelly¹, Andrew J. Davison¹ - Show less +1 more•Institutions (2)

Imperial College London¹, University of Washington²

23 Jun 2013

TL;DR: The object graph enables predictions for accurate ICP-based camera to model tracking at each live frame, and efficient active search for new objects in currently undescribed image regions, as well as the generation of an object level scene description with the potential to enable interaction.

...read moreread less

Abstract: We present the major advantages of a new 'object oriented' 3D SLAM paradigm, which takes full advantage in the loop of prior knowledge that many scenes consist of repeated, domain-specific objects and structures. As a hand-held depth camera browses a cluttered scene, real-time 3D object recognition and tracking provides 6DoF camera-object constraints which feed into an explicit graph of objects, continually refined by efficient pose-graph optimisation. This offers the descriptive and predictive power of SLAM systems which perform dense surface reconstruction, but with a huge representation compression. The object graph enables predictions for accurate ICP-based camera to model tracking at each live frame, and efficient active search for new objects in currently undescribed image regions. We demonstrate real-time incremental SLAM in large, cluttered environments, including loop closure, relocalisation and the detection of moved objects, and of course the generation of an object level scene description with the potential to enable interaction.

...read moreread less

950 citations

Proceedings Article•

Learning a Deep Compact Image Representation for Visual Tracking

[...]

Naiyan Wang¹, Dit-Yan Yeung¹•Institutions (1)

Hong Kong University of Science and Technology¹

05 Dec 2013

TL;DR: Comparison with the state-of-the-art trackers on some challenging benchmark video sequences shows that the deep learning tracker is more accurate while maintaining low computational cost with real-time performance when the MATLAB implementation of the tracker is used with a modest graphics processing unit (GPU).

...read moreread less

Abstract: In this paper, we study the challenging problem of tracking the trajectory of a moving object in a video with possibly very complex background. In contrast to most existing trackers which only learn the appearance of the tracked object online, we take a different approach, inspired by recent advances in deep learning architectures, by putting more emphasis on the (unsupervised) feature learning problem. Specifically, by using auxiliary natural images, we train a stacked de-noising autoencoder offline to learn generic image features that are more robust against variations. This is then followed by knowledge transfer from offline training to the online tracking process. Online tracking involves a classification neural network which is constructed from the encoder part of the trained autoencoder as a feature extractor and an additional classification layer. Both the feature extractor and the classifier can be further tuned to adapt to appearance changes of the moving object. Comparison with the state-of-the-art trackers on some challenging benchmark video sequences shows that our deep learning tracker is more accurate while maintaining low computational cost with real-time performance when our MATLAB implementation of the tracker is used with a modest graphics processing unit (GPU).

...read moreread less

926 citations

Journal Article•DOI•

Looking at Vehicles on the Road: A Survey of Vision-Based Vehicle Detection, Tracking, and Behavior Analysis

[...]

Sayanan Sivaraman¹, Mohan M. Trivedi¹•Institutions (1)

University of California, San Diego¹

01 Dec 2013-IEEE Transactions on Intelligent Transportation Systems

TL;DR: This paper provides a review of the literature in on-road vision-based vehicle detection, tracking, and behavior understanding, and discusses the nascent branch of intelligent vehicles research concerned with utilizing spatiotemporal measurements, trajectories, and various features to characterize on- road behavior.

...read moreread less

Abstract: This paper provides a review of the literature in on-road vision-based vehicle detection, tracking, and behavior understanding. Over the past decade, vision-based surround perception has progressed from its infancy into maturity. We provide a survey of recent works in the literature, placing vision-based vehicle detection in the context of sensor-based on-road surround analysis. We detail advances in vehicle detection, discussing monocular, stereo vision, and active sensor-vision fusion for on-road vehicle detection. We discuss vision-based vehicle tracking in the monocular and stereo-vision domains, analyzing filtering, estimation, and dynamical models. We discuss the nascent branch of intelligent vehicles research concerned with utilizing spatiotemporal measurements, trajectories, and various features to characterize on-road behavior. We provide a discussion on the state of the art, detail common performance metrics and benchmarks, and provide perspective on future research directions in the field.

...read moreread less

862 citations

Journal Article•DOI•

Phase-based video motion processing

[...]

Neal Wadhwa¹, Michael Rubinstein¹, Frédo Durand¹, William T. Freeman¹•Institutions (1)

Massachusetts Institute of Technology¹

21 Jul 2013

TL;DR: A technique to manipulate small movements in videos based on an analysis of motion in complex-valued image pyramids that supports larger amplification factors and is significantly less sensitive to noise is introduced.

...read moreread less

Abstract: We introduce a technique to manipulate small movements in videos based on an analysis of motion in complex-valued image pyramids. Phase variations of the coefficients of a complex-valued steerable pyramid over time correspond to motion, and can be temporally processed and amplified to reveal imperceptible motions, or attenuated to remove distracting changes. This processing does not involve the computation of optical flow, and in comparison to the previous Eulerian Video Magnification method it supports larger amplification factors and is significantly less sensitive to noise. These improved capabilities broaden the set of applications for motion processing in videos. We demonstrate the advantages of this approach on synthetic and natural video sequences, and explore applications in scientific analysis, visualization and video enhancement.

...read moreread less

719 citations

Journal Article•DOI•

Intelligent multi-camera video surveillance: A review

[...]

Xiaogang Wang¹•Institutions (1)

The Chinese University of Hong Kong¹

01 Jan 2013-Pattern Recognition Letters

TL;DR: This paper reviews the recent development of relevant technologies from the perspectives of computer vision and pattern recognition, and discusses how to face emerging challenges of intelligent multi-camera video surveillance.

...read moreread less

695 citations

Proceedings Article•DOI•

Fast Object Segmentation in Unconstrained Video

[...]

Anestis Papazoglou¹, Vittorio Ferrari¹•Institutions (1)

University of Edinburgh¹

01 Dec 2013

TL;DR: This method is fast, fully automatic, and makes minimal assumptions about the video, which enables handling essentially unconstrained settings, including rapidly moving background, arbitrary object motion and appearance, and non-rigid deformations and articulations.

...read moreread less

Abstract: We present a technique for separating foreground objects from the background in a video. Our method is fast, fully automatic, and makes minimal assumptions about the video. This enables handling essentially unconstrained settings, including rapidly moving background, arbitrary object motion and appearance, and non-rigid deformations and articulations. In experiments on two datasets containing over 1400 video shots, our method outperforms a state-of-the-art background subtraction technique [4] as well as methods based on clustering point tracks [6, 18, 19]. Moreover, it performs comparably to recent video object segmentation methods based on object proposals [14, 16, 27], while being orders of magnitude faster.

...read moreread less

662 citations

Journal Article•DOI•

A survey of appearance models in visual object tracking

[...]

Xi Li¹, Weiming Hu¹, Chunhua Shen², Zhongfei Zhang³, Anthony Dick², Anton van den Hengel² - Show less +2 more•Institutions (3)

Chinese Academy of Sciences¹, University of Adelaide², Binghamton University³

08 Oct 2013-ACM Transactions on Intelligent Systems and Technology

TL;DR: A detailed review of the existing 2D appearance models for visual object tracking can be found in this article, where the authors decompose the problem of appearance modeling into two different processing stages: visual representation and statistical modeling.

...read moreread less

Abstract: Visual object tracking is a significant computer vision task which can be applied to many domains, such as visual surveillance, human computer interaction, and video compression. Despite extensive research on this topic, it still suffers from difficulties in handling complex object appearance changes caused by factors such as illumination variation, partial occlusion, shape deformation, and camera motion. Therefore, effective modeling of the 2D appearance of tracked objects is a key issue for the success of a visual tracker. In the literature, researchers have proposed a variety of 2D appearance models. To help readers swiftly learn the recent advances in 2D appearance models for visual object tracking, we contribute this survey, which provides a detailed review of the existing 2D appearance models. In particular, this survey takes a module-based architecture that enables readers to easily grasp the key points of visual object tracking. In this survey, we first decompose the problem of appearance modeling into two different processing stages: visual representation and statistical modeling. Then, different 2D appearance models are categorized and discussed with respect to their composition modules. Finally, we address several issues of interest as well as the remaining challenges for future research on this topic. The contributions of this survey are fourfold. First, we review the literature of visual representations according to their feature-construction mechanisms (i.e., local and global). Second, the existing statistical modeling schemes for tracking-by-detection are reviewed according to their model-construction mechanisms: generative, discriminative, and hybrid generative-discriminative. Third, each type of visual representations or statistical modeling techniques is analyzed and discussed from a theoretical or practical viewpoint. Fourth, the existing benchmark resources (e.g., source codes and video datasets) are examined in this survey.

...read moreread less

653 citations

Posted Content•

A Survey of Appearance Models in Visual Object Tracking

[...]

Xi Li¹, Weiming Hu¹, Chunhua Shen², Zhongfei Zhang³, Anthony Dick², Anton van den Hengel² - Show less +2 more•Institutions (3)

Chinese Academy of Sciences¹, University of Adelaide², Binghamton University³

20 Mar 2013-arXiv: Computer Vision and Pattern Recognition

TL;DR: This survey provides a detailed review of the existing 2D appearance models for visual object tracking and takes a module-based architecture that enables readers to easily grasp the key points ofVisual object tracking.

...read moreread less

Abstract: Visual object tracking is a significant computer vision task which can be applied to many domains such as visual surveillance, human computer interaction, and video compression. In the literature, researchers have proposed a variety of 2D appearance models. To help readers swiftly learn the recent advances in 2D appearance models for visual object tracking, we contribute this survey, which provides a detailed review of the existing 2D appearance models. In particular, this survey takes a module-based architecture that enables readers to easily grasp the key points of visual object tracking. In this survey, we first decompose the problem of appearance modeling into two different processing stages: visual representation and statistical modeling. Then, different 2D appearance models are categorized and discussed with respect to their composition modules. Finally, we address several issues of interest as well as the remaining challenges for future research on this topic. The contributions of this survey are four-fold. First, we review the literature of visual representations according to their feature-construction mechanisms (i.e., local and global). Second, the existing statistical modeling schemes for tracking-by-detection are reviewed according to their model-construction mechanisms: generative, discriminative, and hybrid generative-discriminative. Third, each type of visual representations or statistical modeling techniques is analyzed and discussed from a theoretical or practical viewpoint. Fourth, the existing benchmark resources (e.g., source code and video datasets) are examined in this survey.

...read moreread less

605 citations

Proceedings Article•DOI•

Story-Driven Summarization for Egocentric Video

[...]

Zheng Lu¹, Kristen Grauman¹•Institutions (1)

University of Texas at Austin¹

23 Jun 2013

TL;DR: A video summarization approach that discovers the story of an egocentric video, and defines a random-walk based metric of influence between sub shots that reflects how visual objects contribute to the progression of events.

...read moreread less

Abstract: We present a video summarization approach that discovers the story of an egocentric video. Given a long input video, our method selects a short chain of video sub shots depicting the essential events. Inspired by work in text analysis that links news articles over time, we define a random-walk based metric of influence between sub shots that reflects how visual objects contribute to the progression of events. Using this influence metric, we define an objective for the optimal k-subs hot summary. Whereas traditional methods optimize a summary's diversity or representative ness, ours explicitly accounts for how one sub-event "leads to" another-which, critically, captures event connectivity beyond simple object co-occurrence. As a result, our summaries provide a better sense of story. We apply our approach to over 12 hours of daily activity video taken from 23 unique camera wearers, and systematically evaluate its quality compared to multiple baselines with 34 human subjects.

...read moreread less

Journal Article•DOI•

Online Object Tracking With Sparse Prototypes

[...]

Dong Wang¹, Huchuan Lu¹, Ming-Hsuan Yang²•Institutions (2)

Dalian University of Technology¹, University of California, Merced²

01 Jan 2013-IEEE Transactions on Image Processing

TL;DR: This paper proposes a novel online object tracking algorithm with sparse prototypes, which exploits both classic principal component analysis (PCA) algorithms with recent sparse representation schemes for learning effective appearance models, and introduces l1 regularization into the PCA reconstruction.

...read moreread less

Abstract: Online object tracking is a challenging problem as it entails learning an effective model to account for appearance change caused by intrinsic and extrinsic factors. In this paper, we propose a novel online object tracking algorithm with sparse prototypes, which exploits both classic principal component analysis (PCA) algorithms with recent sparse representation schemes for learning effective appearance models. We introduce l1 regularization into the PCA reconstruction, and develop a novel algorithm to represent an object by sparse prototypes that account explicitly for data and noise. For tracking, objects are represented by the sparse prototypes learned online with update. In order to reduce tracking drift, we present a method that takes occlusion and motion blur into account rather than simply includes image observations for model update. Both qualitative and quantitative evaluations on challenging image sequences demonstrate that the proposed tracking algorithm performs favorably against several state-of-the-art methods.

...read moreread less

Proceedings Article•DOI•

Real-Time 3D Reconstruction in Dynamic Scenes Using Point-Based Fusion

[...]

Maik Keller¹, Damien Lefloch¹, Martin Lambers¹, Shahram Izadi², Tim Weyrich³, Andreas Kolb¹ - Show less +2 more•Institutions (3)

University of Siegen¹, Microsoft², University College London³

29 Jun 2013

TL;DR: A new system for real-time dense reconstruction with equivalent quality to existing online methods, but with support for additional spatial scale and robustness in dynamic scenes, designed around a simple and flat point-Based representation.

...read moreread less

Abstract: Real-time or online 3D reconstruction has wide applicability and receives further interest due to availability of consumer depth cameras. Typical approaches use a moving sensor to accumulate depth measurements into a single model which is continuously refined. Designing such systems is an intricate balance between reconstruction quality, speed, spatial scale, and scene assumptions. Existing online methods either trade scale to achieve higher quality reconstructions of small objects/scenes. Or handle larger scenes by trading real-time performance and/or quality, or by limiting the bounds of the active reconstruction. Additionally, many systems assume a static scene, and cannot robustly handle scene motion or reconstructions that evolve to reflect scene changes. We address these limitations with a new system for real-time dense reconstruction with equivalent quality to existing online methods, but with support for additional spatial scale and robustness in dynamic scenes. Our system is designed around a simple and flat point-Based representation, which directly works with the input acquired from range/depth sensors, without the overhead of converting between representations. The use of points enables speed and memory efficiency, directly leveraging the standard graphics pipeline for all central operations, i.e., camera pose estimation, data association, outlier removal, fusion of depth maps into a single denoised model, and detection and update of dynamic objects. We conclude with qualitative and quantitative results that highlight robust tracking and high quality reconstructions of a diverse set of scenes at varying scales.

...read moreread less

Journal Article•DOI•

A survey on activity recognition and behavior understanding in video surveillance

[...]

Sarvesh Vishwakarma¹, Sarvesh Vishwakarma², Anupam Agrawal²•Institutions (2)

Bharat Heavy Electricals¹, Indian Institute of Information Technology, Allahabad²

01 Oct 2013-The Visual Computer

TL;DR: This paper provides an overview of benchmark databases for activity recognition, the market analysis of video surveillance, and future directions to work on for this application.

...read moreread less

Abstract: This paper provides a comprehensive survey for activity recognition in video surveillance. It starts with a description of simple and complex human activity, and various applications. The applications of activity recognition are manifold, ranging from visual surveillance through content based retrieval to human computer interaction. The organization of this paper covers all aspects of the general framework of human activity recognition. Then it summarizes and categorizes recent-published research progresses under a general framework. Finally, this paper also provides an overview of benchmark databases for activity recognition, the market analysis of video surveillance, and future directions to work on for this application.

...read moreread less

Book•

Algorithms, Complexity Analysis and VLSI Architectures for MPEG-4 Motion Estimation

[...]

Peter M. Kuhn, M Kuhn Peter

17 Jan 2013

TL;DR: Algorithms, Complexity Analysis and VLSI Architectures for MPEG-4 Motion Estimation is an important introduction to numerous algorithmic, architectural and system design aspects of the multimedia standard MPEG-2 and H.263.

...read moreread less

Abstract: MPEG-4 is the multimedia standard for combining interactivity, natural and synthetic digital video, audio and computer-graphics Typical applications are: internet, video conferencing, mobile videophones, multimedia cooperative work, teleteaching and games With MPEG-4 the next step from block-based video (ISO/IEC MPEG-1, MPEG-2, CCITT H261, ITU-T H263) to arbitrarily-shaped visual objects is taken This significant step demands a new methodology for system analysis and design to meet the considerably higher flexibility of MPEG-4 Motion estimation is a central part of MPEG-1/2/4 and H261/H263 video compression standards and has attracted much attention in research and industry, for the following reasons: it is computationally the most demanding algorithm of a video encoder (about 60-80% of the total computation time), it has a high impact on the visual quality of a video encoder, and it is not standardized, thus being open to competition Algorithms, Complexity Analysis, and VLSI Architectures for MPEG-4 Motion Estimation covers in detail every single step in the design of a MPEG-1/2/4 or H261/H263 compliant video encoder: Fast motion estimation algorithms Complexity analysis tools Detailed complexity analysis of a software implementation of MPEG-4 video Complexity and visual quality analysis of fast motion estimation algorithms within MPEG-4 Design space on motion estimation VLSI architectures Detailed VLSI design examples of (1) a high throughput and (2) a low-power MPEG-4 motion estimator Algorithms, Complexity Analysis and VLSI Architectures for MPEG-4 Motion Estimation is an important introduction to numerous algorithmic, architectural and system design aspects of the multimedia standard MPEG-4 As such, all researchers, students and practitioners working in image processing, video coding or system and VLSI design will find this book of interest

...read moreread less

Proceedings Article•DOI•

Video Object Segmentation through Spatially Accurate and Temporally Dense Extraction of Primary Object Regions

[...]

Dong Zhang¹, Omar Javed², Mubarak Shah¹•Institutions (2)

University of Central Florida¹, Princeton University²

23 Jun 2013

TL;DR: The extracted primary object regions are then used to build object models for optimized video segmentation and outperforms both unsupervised and supervised state-of-the-art methods.

...read moreread less

Abstract: In this paper, we propose a novel approach to extract primary object segments in videos in the `object proposal' domain. The extracted primary object regions are then used to build object models for optimized video segmentation. The proposed approach has several contributions: First, a novel layered Directed Acyclic Graph (DAG) based framework is presented for detection and segmentation of the primary object in video. We exploit the fact that, in general, objects are spatially cohesive and characterized by locally smooth motion trajectories, to extract the primary object from the set of all available proposals based on motion, appearance and predicted-shape similarity across frames. Second, the DAG is initialized with an enhanced object proposal set where motion based proposal predictions (from adjacent frames) are used to expand the set of object proposals for a particular frame. Last, the paper presents a motion scoring function for selection of object proposals that emphasizes high optical flow gradients at proposal boundaries to discriminate between moving objects and the background. The proposed approach is evaluated using several challenging benchmark videos and it outperforms both unsupervised and supervised state-of-the-art methods.

...read moreread less

Proceedings Article•DOI•

[...]

Eshed Ohn-Bar¹, Mohan M. Trivedi¹•Institutions (1)

University of California, San Diego¹

23 Jun 2013

TL;DR: A set of features derived from skeleton tracking of the human body and depth maps for the purpose of action recognition are proposed, and a new descriptor for spatio-temporal feature extraction from color and depth images is introduced.

...read moreread less

Abstract: We propose a set of features derived from skeleton tracking of the human body and depth maps for the purpose of action recognition. The descriptors proposed are easy to implement, produce relatively small-sized feature sets, and the multi-class classification scheme is fast and suitable for real-time applications. We intuitively characterize actions using pairwise affinities between view-invariant joint angles features over the performance of an action. Additionally, a new descriptor for spatio-temporal feature extraction from color and depth images is introduced. This descriptor involves an application of a modified histogram of oriented gradients (HOG) algorithm. The application produces a feature set at every frame, and these features are collected into a 2D array which then the same algorithm is applied to again (the approach is termed HOG2). Both feature sets are evaluated in a bag-of-words scheme using a linear SVM, showing state-of-the-art results on public datasets from different domains of human-computer interaction.

...read moreread less

Proceedings Article•DOI•

Visual Tracking via Locality Sensitive Histograms

[...]

Shengfeng He¹, Qingxiong Yang¹, Rynson W. H. Lau¹, Jiang Wang¹, Ming-Hsuan Yang² - Show less +1 more•Institutions (2)

City University of Hong Kong¹, University of California, Merced²

23 Jun 2013

TL;DR: A robust tracking framework based on the locality sensitive histograms is proposed, which consists of two main components: a new feature for tracking that is robust to illumination changes and a novel multi-region tracking algorithm that runs in real time even with hundreds of regions.

...read moreread less

Abstract: This paper presents a novel locality sensitive histogram algorithm for visual tracking. Unlike the conventional image histogram that counts the frequency of occurrences of each intensity value by adding ones to the corresponding bin, a locality sensitive histogram is computed at each pixel location and a floating-point value is added to the corresponding bin for each occurrence of an intensity value. The floating-point value declines exponentially with respect to the distance to the pixel location where the histogram is computed, thus every pixel is considered but those that are far away can be neglected due to the very small weights assigned. An efficient algorithm is proposed that enables the locality sensitive histograms to be computed in time linear in the image size and the number of bins. A robust tracking framework based on the locality sensitive histograms is proposed, which consists of two main components: a new feature for tracking that is robust to illumination changes and a novel multi-region tracking algorithm that runs in real time even with hundreds of regions. Extensive experiments demonstrate that the proposed tracking framework outperforms the state-of-the-art methods in challenging scenarios, especially when the illumination changes dramatically.

...read moreread less

Proceedings Article•DOI•

First-Person Activity Recognition: What Are They Doing to Me?

[...]

Michael S. Ryoo¹, Larry Matthies¹•Institutions (1)

California Institute of Technology¹

23 Jun 2013

TL;DR: This paper investigates multi-channel kernels to integrate global and local motion information, and presents a new activity learning/recognition methodology that explicitly considers temporal structures displayed in first-person activity videos.

...read moreread less

Abstract: This paper discusses the problem of recognizing interaction-level human activities from a first-person viewpoint. The goal is to enable an observer (e.g., a robot or a wearable camera) to understand 'what activity others are performing to it' from continuous video inputs. These include friendly interactions such as 'a person hugging the observer' as well as hostile interactions like 'punching the observer' or 'throwing objects to the observer', whose videos involve a large amount of camera ego-motion caused by physical interactions. The paper investigates multi-channel kernels to integrate global and local motion information, and presents a new activity learning/recognition methodology that explicitly considers temporal structures displayed in first-person activity videos. In our experiments, we not only show classification results with segmented videos, but also confirm that our new approach is able to detect activities from continuous videos reliably.

...read moreread less

Proceedings Article•DOI•

Self-Paced Learning for Long-Term Tracking

[...]

James Steven Supancic¹, Deva Ramanan¹•Institutions (1)

University of California, Irvine¹

23 Jun 2013

TL;DR: This work addresses the problem of long-term object tracking, where the object may become occluded or leave-the-view, and develops simple but effective algorithms that alternate between tracking and learning a good appearance model given a track.

...read moreread less

Abstract: We address the problem of long-term object tracking, where the object may become occluded or leave-the-view. In this setting, we show that an accurate appearance model is considerably more effective than a strong motion model. We develop simple but effective algorithms that alternate between tracking and learning a good appearance model given a track. We show that it is crucial to learn from the "right" frames, and use the formalism of self-paced curriculum learning to automatically select such frames. We leverage techniques from object detection for learning accurate appearance-based templates, demonstrating the importance of using a large negative training set (typically not used for tracking). We describe both an offline algorithm (that processes frames in batch) and a linear-time on-line (i.e. causal) algorithm that approaches real-time performance. Our models significantly outperform prior art, reducing the average error on benchmark videos by a factor of 4.

...read moreread less

Journal Article•DOI•

Enhancing Cognition with Video Games: A Multiple Game Training Study

[...]

Adam Oei¹, Michael D. Patterson¹•Institutions (1)

Nanyang Technological University¹

13 Mar 2013-PLOS ONE

TL;DR: It is concluded that training specific cognitive abilities frequently in a video game improves performance in tasks that share common underlying demands, and many video game-related cognitive improvements to cognition may be attributed to near-transfer effects.

...read moreread less

Abstract: Background Previous evidence points to a causal link between playing action video games and enhanced cognition and perception. However, benefits of playing other video games are under-investigated. We examined whether playing non-action games also improves cognition. Hence, we compared transfer effects of an action and other non-action types that required different cognitive demands. Methodology/Principal Findings We instructed 5 groups of non-gamer participants to play one game each on a mobile device (iPhone/iPod Touch) for one hour a day/five days a week over four weeks (20 hours). Games included action, spatial memory, match-3, hidden- object, and an agent-based life simulation. Participants performed four behavioral tasks before and after video game training to assess for transfer effects. Tasks included an attentional blink task, a spatial memory and visual search dual task, a visual filter memory task to assess for multiple object tracking and cognitive control, as well as a complex verbal span task. Action game playing eliminated attentional blink and improved cognitive control and multiple-object tracking. Match-3, spatial memory and hidden object games improved visual search performance while the latter two also improved spatial working memory. Complex verbal span improved after match-3 and action game training. Conclusion/Significance Cognitive improvements were not limited to action game training alone and different games enhanced different aspects of cognition. We conclude that training specific cognitive abilities frequently in a video game improves performance in tasks that share common underlying demands. Overall, these results suggest that many video game-related cognitive improvements may not be due to training of general broad cognitive systems such as executive attentional control, but instead due to frequent utilization of specific cognitive processes during game play. Thus, many video game training related improvements to cognition may be attributed to near-transfer effects.

...read moreread less

Proceedings Article•DOI•

A Thousand Frames in Just a Few Words: Lingual Description of Videos through Latent Topics and Sparse Object Stitching

[...]

Pradipto Das¹, Chenliang Xu¹, Richard F. Doell¹, Jason J. Corso¹•Institutions (1)

University at Buffalo¹

23 Jun 2013

TL;DR: This paper proposes a hybrid system consisting of a low level multimodal latent topic model for initial keyword annotation, a middle level of concept detectors and a high level module to produce final lingual descriptions that captures the most relevant contents of a video in a natural language description.

...read moreread less

Abstract: The problem of describing images through natural language has gained importance in the computer vision community. Solutions to image description have either focused on a top-down approach of generating language through combinations of object detections and language models or bottom-up propagation of keyword tags from training images to test images through probabilistic or nearest neighbor techniques. In contrast, describing videos with natural language is a less studied problem. In this paper, we combine ideas from the bottom-up and top-down approaches to image description and propose a method for video description that captures the most relevant contents of a video in a natural language description. We propose a hybrid system consisting of a low level multimodal latent topic model for initial keyword annotation, a middle level of concept detectors and a high level module to produce final lingual descriptions. We compare the results of our system to human descriptions in both short and long forms on two datasets, and demonstrate that final system output has greater agreement with the human descriptions than any single level.

...read moreread less

Proceedings Article•DOI•

Tracking Revisited Using RGBD Camera: Unified Benchmark and Baselines

[...]

Shuran Song¹, Jianxiong Xiao¹•Institutions (1)

Princeton University¹

01 Dec 2013

TL;DR: A unified benchmark dataset of 100 RGBD videos with high diversity is constructed, different kinds of RGBD tracking algorithms using 2D or 3D model are proposed, and a quantitative comparison of various algorithms with RGB or RGBD input is presented.

...read moreread less

Abstract: Despite significant progress, tracking is still considered to be a very challenging task. Recently, the increasing popularity of depth sensors has made it possible to obtain reliable depth easily. This may be a game changer for tracking, since depth can be used to prevent model drift and handle occlusion. We also observe that current tracking algorithms are mostly evaluated on a very small number of videos collected and annotated by different groups. The lack of a reasonable size and consistently constructed benchmark has prevented a persuasive comparison among different algorithms. In this paper, we construct a unified benchmark dataset of 100 RGBD videos with high diversity, propose different kinds of RGBD tracking algorithms using 2D or 3D model, and present a quantitative comparison of various algorithms with RGB or RGBD input. We aim to lay the foundation for further research in both RGB and RGBD tracking, and our benchmark is available at http://tracking.cs.princeton.edu.

...read moreread less

Proceedings Article•DOI•

The Visual Object Tracking VOT2013 Challenge Results

[...]

Matej Kristan¹, Roman Pflugfelder², Ale Leonardis³, Jiri Matas, Fatih Porikli⁴, Luka Cehovin¹, Georg Nebehay², Gustavo Fernandez², Toma Vojir, Adam Gatt⁵, Ahmad Khajenezhad⁶, Ahmed Salahledin⁷, Ali Soltani-Farani⁶, Ali Zarezade⁶, Alfredo Petrosino⁸, Anthony Milton⁹, Behzad Bozorgtabar¹⁰, Bo Li⁸, Chee Seng Chan¹¹, Cherkeng Heng⁸, Dale A. Ward⁹, David Kearney⁹, Dorothy Monekosso¹², Hakki Can Karaimer¹³, Hamid R. Rabiee⁶, Jianke Zhu¹⁴, Jin Gao, Jingjing Xiao³, Junge Zhang¹⁵, Junliang Xing, Kaiqi Huang¹⁵, Karel Lebeda¹⁶, Lijun Cao¹⁵, Mario Edoardo Maresca¹⁷, Mei Kuan Lim¹¹, Mohamed El Helw⁷, Michael Felsberg⁸, Paolo Remagnino¹⁸, Richard Bowden¹⁶, Roland Goecke⁴, Rustam Stolkin⁴, Samantha Yueying Lim⁸, Sara Maher⁷, Sebastien Poullot¹⁹, Sebastien Wong⁵, Shin'Ichi Satoh¹⁹, Weihua Chen¹⁵, Weiming Hu, Xiaoqin Zhang, Yang Li¹⁴, ZhiHeng Niu⁸ - Show less +47 more•Institutions (19)

University of Ljubljana¹, Austrian Institute of Technology², University of Birmingham³, Australian National University⁴, Defence Science and Technology Organisation⁵, Sharif University of Technology⁶, Nile University⁷, Panasonic⁸, University of South Australia⁹, University of Canberra¹⁰, University of Malaya¹¹, University of the West of England¹², İzmir Institute of Technology¹³, Zhejiang University¹⁴, Chinese Academy of Sciences¹⁵, University of Surrey¹⁶, Parthenope University of Naples¹⁷, Kingston University¹⁸, Hitotsubashi University¹⁹

02 Dec 2013

TL;DR: The evaluation protocol of the VOT2013 challenge and the results of a comparison of 27 trackers on the benchmark dataset are presented, offering a more systematic comparison of the trackers.

...read moreread less

Abstract: Visual tracking has attracted a significant attention in the last few decades. The recent surge in the number of publications on tracking-related problems have made it almost impossible to follow the developments in the field. One of the reasons is that there is a lack of commonly accepted annotated data-sets and standardized evaluation protocols that would allow objective comparison of different tracking methods. To address this issue, the Visual Object Tracking (VOT) workshop was organized in conjunction with ICCV2013. Researchers from academia as well as industry were invited to participate in the first VOT2013 challenge which aimed at single-object visual trackers that do not apply pre-learned models of object appearance (model-free). Presented here is the VOT2013 benchmark dataset for evaluation of single-object visual trackers as well as the results obtained by the trackers competing in the challenge. In contrast to related attempts in tracker benchmarking, the dataset is labeled per-frame by visual attributes that indicate occlusion, illumination change, motion change, size change and camera motion, offering a more systematic comparison of the trackers. Furthermore, we have designed an automated system for performing and evaluating the experiments. We present the evaluation protocol of the VOT2013 challenge and the results of a comparison of 27 trackers on the benchmark dataset. The dataset, the evaluation tools and the tracker rankings are publicly available from the challenge website (http://votchallenge.net).

...read moreread less

Journal Article•DOI•

Effective Multiple Feature Hashing for Large-Scale Near-Duplicate Video Retrieval

[...]

Jingkuan Song¹, Yi Yang², Zi Huang¹, Heng Tao Shen¹, Jiebo Luo³ - Show less +1 more•Institutions (3)

University of Queensland¹, Carnegie Mellon University², University of Rochester³

01 Dec 2013-IEEE Transactions on Multimedia

TL;DR: A novel approach-Multiple Feature Hashing (MFH) to tackle both the accuracy and the scalability issues of NDVR and shows that the proposed method outperforms the state-of-the-art techniques in both accuracy and efficiency.

...read moreread less

Abstract: Near-duplicate video retrieval (NDVR) has recently attracted much research attention due to the exponential growth of online videos. It has many applications, such as copyright protection, automatic video tagging and online video monitoring. Many existing approaches use only a single feature to represent a video for NDVR. However, a single feature is often insufficient to characterize the video content. Moreover, while the accuracy is the main concern in previous literatures, the scalability of NDVR algorithms for large scale video datasets has been rarely addressed. In this paper, we present a novel approach-Multiple Feature Hashing (MFH) to tackle both the accuracy and the scalability issues of NDVR. MFH preserves the local structural information of each individual feature and also globally considers the local structures for all the features to learn a group of hash functions to map the video keyframes into the Hamming space and generate a series of binary codes to represent the video dataset. We evaluate our approach on a public video dataset and a large scale video dataset consisting of 132,647 videos collected from YouTube by ourselves. This dataset has been released (http://itee.uq.edu.au/shenht/UQ_VIDEO/). The experimental results show that the proposed method outperforms the state-of-the-art techniques in both accuracy and efficiency.

...read moreread less

Proceedings Article•DOI•

Part-Based Visual Tracking with Online Latent Structural Learning

[...]

Rui Yao, Qinfeng Shi¹, Chunhua Shen¹, Yanning Zhang, Anton van den Hengel¹ - Show less +1 more•Institutions (1)

University of Adelaide¹

23 Jun 2013

TL;DR: This work extends the online algorithm pegasos to the structured prediction case (i.e., predicting the location of the bounding boxes) with latent part variables and shows that the method outperforms the state-of-the-art (linear and non-linear kernel) trackers.

...read moreread less

Abstract: Despite many advances made in the area, deformable targets and partial occlusions continue to represent key problems in visual tracking. Structured learning has shown good results when applied to tracking whole targets, but applying this approach to a part-based target model is complicated by the need to model the relationships between parts, and to avoid lengthy initialisation processes. We thus propose a method which models the unknown parts using latent variables. In doing so we extend the online algorithm pegasos to the structured prediction case (i.e., predicting the location of the bounding boxes) with latent part variables. To better estimate the parts, and to avoid over-fitting caused by the extra model complexity/capacity introduced by the parts, we propose a two-stage training process, based on the primal rather than the dual form. We then show that the method outperforms the state-of-the-art (linear and non-linear kernel) trackers.

...read moreread less

Journal Article•DOI•

Learning to Track and Identify Players from Broadcast Sports Videos

[...]

Wei-Lwun Lu¹, Jo-Anne Ting², James J. Little³, Kevin Murphy¹•Institutions (3)

Google¹, Bosch², University of British Columbia³

01 Jul 2013-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A system that possesses the ability to detect and track multiple players, estimates the homography between video frames and the court, and identifies the players, and proposes a novel Linear Programming (LP) Relaxation algorithm for predicting the best player identification in a video clip.

...read moreread less

Abstract: Tracking and identifying players in sports videos filmed with a single pan-tilt-zoom camera has many applications, but it is also a challenging problem. This paper introduces a system that tackles this difficult task. The system possesses the ability to detect and track multiple players, estimates the homography between video frames and the court, and identifies the players. The identification system combines three weak visual cues, and exploits both temporal and mutual exclusion constraints in a Conditional Random Field (CRF). In addition, we propose a novel Linear Programming (LP) Relaxation algorithm for predicting the best player identification in a video clip. In order to reduce the number of labeled training data required to learn the identification system, we make use of weakly supervised learning with the assistance of play-by-play texts. Experiments show promising results in tracking, homography estimation, and identification. Moreover, weakly supervised learning with play-by-play texts greatly reduces the number of labeled training examples required. The identification system can achieve similar accuracies by using merely 200 labels in weakly supervised learning, while a strongly supervised approach needs a least 20,000 labels.

...read moreread less

Journal Article•DOI•

A General Framework for Tracking Multiple People from a Moving Camera

[...]

Wongun Choi¹, Caroline Pantofaru², Silvio Savarese¹•Institutions (2)

University of Michigan¹, Willow Garage²

01 Jul 2013-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: Experimental evidence shows that the proposed method can robustly estimate a camera's motion from dynamic scenes and stably track people who are moving independently or interacting.

...read moreread less

Abstract: In this paper, we present a general framework for tracking multiple, possibly interacting, people from a mobile vision platform. To determine all of the trajectories robustly and in a 3D coordinate system, we estimate both the camera's ego-motion and the people's paths within a single coherent framework. The tracking problem is framed as finding the MAP solution of a posterior probability, and is solved using the reversible jump Markov chain Monte Carlo (RJ-MCMC) particle filtering method. We evaluate our system on challenging datasets taken from moving cameras, including an outdoor street scene video dataset, as well as an indoor RGB-D dataset collected in an office. Experimental evidence shows that the proposed method can robustly estimate a camera's motion from dynamic scenes and stably track people who are moving independently or interacting.

...read moreread less

Journal Article•DOI•

Robust Visual Tracking Using an Adaptive Coupled-Layer Visual Model

[...]

Luka Cehovin¹, Matej Kristan¹, Ales Leonardis²•Institutions (2)

University of Ljubljana¹, University of Birmingham²

01 Apr 2013-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A novel coupled-layer visual model that combines the target's global and local appearance by interlacing two layers is proposed that outperforms the related trackers by having a smaller failure rate as well as better accuracy.

...read moreread less

Abstract: This paper addresses the problem of tracking objects which undergo rapid and significant appearance changes. We propose a novel coupled-layer visual model that combines the target's global and local appearance by interlacing two layers. The local layer in this model is a set of local patches that geometrically constrain the changes in the target's appearance. This layer probabilistically adapts to the target's geometric deformation, while its structure is updated by removing and adding the local patches. The addition of these patches is constrained by the global layer that probabilistically models the target's global visual properties, such as color, shape, and apparent local motion. The global visual properties are updated during tracking using the stable patches from the local layer. By this coupled constraint paradigm between the adaptation of the global and the local layer, we achieve a more robust tracking through significant appearance changes. We experimentally compare our tracker to 11 state-of-the-art trackers. The experimental results on challenging sequences confirm that our tracker outperforms the related trackers in many cases by having a smaller failure rate as well as better accuracy. Furthermore, the parameter analysis shows that our tracker is stable over a range of parameter values.

...read moreread less

Journal Article•DOI•

Multiple Hypothesis Tracking for Cluttered Biological Image Sequences

[...]

Nicolas Chenouard¹, Isabelle Bloch², Jean-Christophe Olivo-Marin¹•Institutions (2)

Pasteur Institute¹, Télécom ParisTech²

01 Nov 2013-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: The proposed method for simultaneously tracking thousands of targets in biological image sequences, which is of major importance in modern biology, is presented and the benefits of advanced Bayesian tracking techniques for the accurate computational modeling of dynamical biological processes are demonstrated.

...read moreread less

Abstract: In this paper, we present a method for simultaneously tracking thousands of targets in biological image sequences, which is of major importance in modern biology. The complexity and inherent randomness of the problem lead us to propose a unified probabilistic framework for tracking biological particles in microscope images. The framework includes realistic models of particle motion and existence and of fluorescence image features. For the track extraction process per se, the very cluttered conditions motivate the adoption of a multiframe approach that enforces tracking decision robustness to poor imaging conditions and to random target movements. We tackle the large-scale nature of the problem by adapting the multiple hypothesis tracking algorithm to the proposed framework, resulting in a method with a favorable tradeoff between the model complexity and the computational cost of the tracking procedure. When compared to the state-of-the-art tracking techniques for bioimaging, the proposed algorithm is shown to be the only method providing high-quality results despite the critically poor imaging conditions and the dense target presence. We thus demonstrate the benefits of advanced Bayesian tracking techniques for the accurate computational modeling of dynamical biological processes, which is promising for further developments in this domain.

...read moreread less

Collapse