scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Recognizing interaction between human performers using 'key pose doublet'

TL;DR: A graph theoretic approach for recognizing interactions between two human performers present in a video clip and applies the same centrality measure on all possible combinations of the key poses of the two performers to select the set of 'key pose doublets' that best represent the corresponding action.
Abstract: In this paper, we propose a graph theoretic approach for recognizing interactions between two human performers present in a video clip. We watch primarily the human poses of each performer and derive descriptors that capture the motion patterns of the poses. From an initial dictionary of poses (visual words), we extract key poses (or key words) by ranking the poses on the centrality measure of graph connectivity. We argue that the key poses are graph nodes which share a close semantic relationship (in terms of some suitable edge weight function) with all other pose nodes and hence are said to be the central part of the graph. We apply the same centrality measure on all possible combinations of the key poses of the two performers to select the set of 'key pose doublets' that best represent the corresponding action. The results on standard interaction recognition dataset show the robustness of our approach when compared to the present state of the art method.
Citations
More filters
Journal ArticleDOI

[...]

TL;DR: The survey introduced in this paper tries to cover the lack of a complete description of the most important public datasets for video-based human activity and action recognition and to guide researchers in the election of themost suitable dataset for benchmarking their algorithms.
Abstract: Highlights? Description of datasets for video-based human activity and action recognition. ? 68 datasets reported: 28 for heterogeneous and 40 for specific human actions. ? Useful data, such as web for dowloading, published works or ground truth, are provided. ? Datasets are compared and classified. Vision-based human action and activity recognition has an increasing importance among the computer vision community with applications to visual surveillance, video retrieval and human-computer interaction. In recent years, more and more datasets dedicated to human action and activity recognition have been created. The use of these datasets allows us to compare different recognition systems with the same input data. The survey introduced in this paper tries to cover the lack of a complete description of the most important public datasets for video-based human activity and action recognition and to guide researchers in the election of the most suitable dataset for benchmarking their algorithms.

373 citations

Proceedings ArticleDOI

[...]

29 Oct 2012
TL;DR: A new random forest structure is proposed, called multi-class balanced random forest, which makes a good trade-off between the balance of the trees and the discriminative abilities and significantly outperforms the state of the arts for the human activity prediction problem.
Abstract: Early recognition and prediction of human activities are of great importance in video surveillance, e.g., by recognizing a criminal activity at its beginning stage, it is possible to avoid unfortunate outcomes. We address early activity recognition by developing a Spatial-Temporal Implicit Shape Model (STISM), which characterizes the space-time structure of the sparse local features extracted from a video. The early recognition of human activities is accomplished by pattern matching through STISM. To enable efficient and robust matching, we propose a new random forest structure, called multi-class balanced random forest, which makes a good trade-off between the balance of the trees and the discriminative abilities. The prediction is done simultaneously for multiple classes, which saves both the memory and computational cost. The experiments show that our algorithm significantly outperforms the state of the arts for the human activity prediction problem.

62 citations

Journal ArticleDOI

[...]

TL;DR: Zhang et al. as discussed by the authors proposed propagative generalized Hough voting (HV) to propagate the label and spatio-temporal configuration information of local features via HV.
Abstract: Generalized Hough voting (HV) has shown promising results in both object and action detection. However, most existing HV methods will suffer when insufficient training data are provided. We propose propagative HV to address this limitation and apply it to human activity analysis. Instead of training a discriminative classifier for local feature voting, we match individual local features to propagate the label and spatiotemporal configuration information of local features via HV. To enable a fast local feature matching, we index the local features using random projection trees (RPTs). RPTs can reveal the low-dimension manifold structure to provide adaptive local feature matching. Moreover, as the RPT index can be built in either labeled or unlabeled dataset, it can be applied to different tasks, such as activity search (limited training) and recognition (sufficient training). The superior performances on benchmarked datasets validate that our propagative HV can outperform state-of-the-art techniques in various activity analysis tasks, such as activity search, recognition, and prediction.

26 citations

Journal ArticleDOI

[...]

01 May 2014
TL;DR: A graph theoretic approach is proposed to recognize interactions between two human performers in a video to show the efficacy of the proposed approach compared to the state-of-the-art.
Abstract: A graph theoretic approach is proposed to recognize interactions (e.g., handshaking, punching, etc.) between two human performers in a video. Pose descriptors corresponding to each performer in the video are generated and clustered to form initial codebooks of human poses. Compact codebooks of dominating poses for each of the two performers are created by ranking the poses of the initial codebooks using two different methods. First, an average centrality measure of graph connectivity is introduced where poses are nodes in the graph. The dominating poses are graph nodes sharing a close semantic relationship with all other pose nodes and hence are expected to be at the central part of the graph. Second, a novel similarity measure is introduced for ranking dominating poses. The `pose doublets', all possible combinations of dominating poses of the two performers, are ranked using an improved centrality measure of a bipartite graph. The set of `dominating pose doublets' that best represents the corresponding interaction are selected using the perceptual analysis technique. The recognition results on standard interaction datasets show the efficacy of the proposed approach compared to the state-of-the-art.

17 citations


Cites background or methods from "Recognizing interaction between hum..."

  • [...]

  • [...]

  • [...]

  • [...]

  • [...]

Journal ArticleDOI

[...]

TL;DR: A novel human interaction recognition method based on multiple stage probability fusion that not only simplifies the extraction and representation of features, but also avoids the wrong feature extraction caused by occlusion.
Abstract: Visual-based human interactive behavior recognition is a challenging research topic in computer vision There exist some important problems in the current interaction recognition algorithms, such as very complex feature representation and inaccurate feature extraction induced by wrong human body segmentation In order to solve these problems, a novel human interaction recognition method based on multiple stage probability fusion is proposed in this paper According to the human body’s contact in interaction as a cut-off point, the process of the interaction can be divided into three stages: start stage, execution stage and end stage Two persons’ motions are respectively extracted and recognizes in the start stage and the finish stage when there is no contact between those persons The two persons’ motion is extracted as a whole and recognized in the execution stage In the recognition process, the final recognition results are obtained by the weighted fusing these probabilities in different stages The proposed method not only simplifies the extraction and representation of features, but also avoids the wrong feature extraction caused by occlusion Experiment results on the UT-interaction dataset demonstrated that the proposed method results in a better performance than other recent interaction recognition methods

13 citations


Cites background from "Recognizing interaction between hum..."

  • [...]

References
More filters

[...]

01 Jan 2005

19,237 citations

Proceedings Article

[...]

24 Aug 1981
TL;DR: In this paper, the spatial intensity gradient of the images is used to find a good match using a type of Newton-Raphson iteration, which can be generalized to handle rotation, scaling and shearing.
Abstract: Image registration finds a variety of applications in computer vision. Unfortunately, traditional image registration techniques tend to be costly. We present a new image registration technique that makes use of the spatial intensity gradient of the images to find a good match using a type of Newton-Raphson iteration. Our technique is taster because it examines far fewer potential matches between the images than existing techniques Furthermore, this registration technique can be generalized to handle rotation, scaling and shearing. We show how our technique can be adapted tor use in a stereo vision system.

12,938 citations

Book

[...]

14 Nov 1995
TL;DR: In this article, the authors introduce the concept of graph coloring and propose a graph coloring algorithm based on the Eulers formula for k-chromatic graphs, which can be seen as a special case of the graph coloring problem.
Abstract: 1. Fundamental Concepts. Definitions and examples. Paths and proofs. Vertex degrees and counting. Degrees and algorithmic proof. 2. Trees and Distance. Basic properties. Spanning trees and enumeration. Optimization and trees. Eulerian graphs and digraphs. 3. Matchings and Factors. Matchings in bipartite graphs. Applications and algorithms. Matchings in general graphs. 4. Connectivity and Paths. Cuts and connectivity. k-connected graphs. Network flow problems. 5. Graph Coloring. Vertex colorings and upper bounds. Structure of k-chromatic graphs. Enumerative aspects. 6. Edges and Cycles. Line graphs and edge-coloring. Hamiltonian cycles. Complexity. 7. Planar Graphs. Embeddings and Eulers formula. Characterization of planar graphs. Parameters of planarity. 8. Additional Topics. Perfect graphs. Matroids. Ramsey theory. More extremal problems. Random graphs. Eigenvalues of graphs. Glossary of Terms. Glossary of Notation. References. Author Index. Subject Index.

7,114 citations

[...]

01 Jan 2001

4,064 citations

Journal ArticleDOI

[...]

TL;DR: A novel unsupervised learning method for human action categories that can recognize and localize multiple actions in long and complex video sequences containing multiple motions.
Abstract: We present a novel unsupervised learning method for human action categories. A video sequence is represented as a collection of spatial-temporal words by extracting space-time interest points. The algorithm automatically learns the probability distributions of the spatial-temporal words and the intermediate topics corresponding to human action categories. This is achieved by using latent topic models such as the probabilistic Latent Semantic Analysis (pLSA) model and Latent Dirichlet Allocation (LDA). Our approach can handle noisy feature points arisen from dynamic background and moving cameras due to the application of the probabilistic models. Given a novel video sequence, the algorithm can categorize and localize the human action(s) contained in the video. We test our algorithm on three challenging datasets: the KTH human motion dataset, the Weizmann human action dataset, and a recent dataset of figure skating actions. Our results reflect the promise of such a simple approach. In addition, our algorithm can recognize and localize multiple actions in long and complex video sequences containing multiple motions.

1,406 citations