scispace - formally typeset
Search or ask a question

Showing papers by "Ioannis Pitas published in 2012"


Journal ArticleDOI
TL;DR: The proposed view invariant action recognition method is the first one that has been tested in challenging experimental setups, a fact that denotes its effectiveness to deal with most of the open issues in action recognition.
Abstract: In this paper, a novel view invariant action recognition method based on neural network representation and recognition is proposed. The novel representation of action videos is based on learning spatially related human body posture prototypes using self organizing maps. Fuzzy distances from human body posture prototypes are used to produce a time invariant action representation. Multilayer perceptrons are used for action classification. The algorithm is trained using data from a multi-camera setup. An arbitrary number of cameras can be used in order to recognize actions using a Bayesian framework. The proposed method can also be applied to videos depicting interactions between humans, without any modification. The use of information captured from different viewing angles leads to high classification performance. The proposed method is the first one that has been tested in challenging experimental setups, a fact that denotes its effectiveness to deal with most of the open issues in action recognition.

143 citations


Journal ArticleDOI
TL;DR: The discriminant movement representation combined with camera viewpoint identification and a nearest centroid classification step leads to a high human movement classification accuracy.

79 citations


Journal ArticleDOI
TL;DR: A view-invariant activity-independent person identification method based on human activity information is proposed and has been tested in challenging problem setups, simulating real application situations.
Abstract: In this paper, a novel view invariant person identification method based on human activity information is proposed. Unlike most methods proposed in the literature, in which “walk” (i.e., gait) is assumed to be the only activity exploited for person identification, we incorporate several activities in order to identify a person. A multicamera setup is used to capture the human body from different viewing angles. Fuzzy vector quantization and linear discriminant analysis are exploited in order to provide a discriminant activity representation. Person identification, activity recognition, and viewing angle specification results are obtained for all the available cameras independently. By properly combining these results, a view-invariant activity-independent person identification method is obtained. The proposed approach has been tested in challenging problem setups, simulating real application situations. Experimental results are very promising.

73 citations


Journal ArticleDOI
TL;DR: The proposed method incorporates appropriate discriminant constraints in the NMF decomposition cost function in order to address the problem of finding discriminant projections that enhance class separability in the reduced dimensional projection space, while taking into account subclass information.

40 citations


Proceedings Article
18 Oct 2012
TL;DR: A novel action recognition method that overcomes the assumption that the person under consideration is visible from all the cameras forming the adopted camera setup and exploits information coming from an arbitrary number of viewing angles is proposed.
Abstract: While action recognition methods exploiting information coming from multiple viewing angles have been proposed in order to overcome the known viewing angle assumption of single-view methods, they set the assumption that the person under consideration is visible from all the cameras forming the adopted camera setup. However, this assumption is not usually met in real applications and, thus, their applicability is limited. In this paper we propose a novel action recognition method that overcomes this assumption. The method exploits information coming from an arbitrary number of viewing angles. The classification procedure involves Fuzzy Vector Quantization and Artificial Neural Networks. Experiments on two publicly available action recognition databases evaluate the effectiveness of the proposed action recognition approach.

26 citations


Proceedings ArticleDOI
10 Jun 2012
TL;DR: This paper proposes a novel method aiming at view-independent multi-view action recognition that performs single-View action representation and classification to all the available videos depicting the person under consideration independently.
Abstract: In this paper we propose a novel method aiming at view-independent multi-view action recognition. Instead of combining the information provided by all the cameras forming the camera setup, for action representation and classification, we perform single-view action representation and classification to all the available videos depicting the person under consideration independently. Action representation involves a self organizing neural network training followed by fuzzy vector quantization. Action classification is performed by a feedforward neural network which is trained for view-invariant action recognition. Multiple action classification results combination based on Bayesian learning, in the recognition phase, results to high action recognition accuracy. The performance of the proposed action recognition method is evaluated on two publicly available databases, aiming at different application scenarios.

24 citations


Proceedings ArticleDOI
25 Mar 2012
TL;DR: A novel method aiming at eating and drinking activity recognition is presented, where activities are considered as a sequence of human body poses forming 3D volumes, in which the third dimension refers to time.
Abstract: Eating and drinking activity recognition can be considered a solitary research field in activity recognition area. The development of an application capable to identify human eating and drinking activity can be really useful in a smart home environment targeting to extend independent living of older persons in the early stages of dementia. In this paper a novel method aiming at eating and drinking activity recognition is presented. Activities are considered as a sequence of human body poses forming 3D volumes, in which the third dimension refers to time. Fuzzy Vector Quantization is performed to associate the 3D volume representation of an activity video with 3D volume prototypes and Linear Discriminant Analysis is used to map activity representations in a low dimensional discriminant feature space. In this space a simple Nearest Centroid classification procedure leads to very satisfactory classification results.

21 citations


Journal ArticleDOI
TL;DR: A novel set of multiplicative update rules is proposed, which is independent from any kind of learning rate parameter, provides computational efficiency compared to the conventional batch training approach and is easy to implement.

10 citations


Journal ArticleDOI
TL;DR: Experimental results on pedestrian detection indicate the efficiency of the proposed method in shape matching and all online training and both online and offline testing operations can be performed in O(logn) time.

10 citations


Journal ArticleDOI
TL;DR: This paper investigates the possibility of extracting latent aspects of a video in order to develop a video fingerprinting framework using a generative probabilistic model, namely the Latent Dirichlet Allocation (LDA).

9 citations


Proceedings ArticleDOI
06 Sep 2012
TL;DR: A novel algorithm for discriminating pornographic and assorted benign images, each categorized into semantic subclasses, based on a tree-structured ensemble of strong Random Forest classifiers is presented, which achieves competitive performance both in terms of response time and accuracy when compared to the state-of-the-art.
Abstract: We present a novel algorithm for discriminating pornographic and assorted benign images, each categorized into semantic subclasses. The algorithm exploits connectedness and coherence properties in skin image regions in order to capture alarming Regions of Interest (ROIs). The technique to identify ROIs in an image employs a region-splitting scheme, in which the image plane is recursively partitioned into quadrants. Splitting is achieved by considering both the accumulation of skin pixels and texture coherence. This processing step is proven to significantly boost the accuracy and reduction of running time demands, even in the presence of sparse noise due to errors attributed to skin segmentation. For detected ROIs, we extract 15 rough color and spatial features computed from the pixels residing in the ROI. A novel classification scheme based on a tree-structured ensemble of strong Random Forest classifiers is also proposed. The method achieves competitive performance both in terms of response time and accuracy when compared to the state-of-the-art.

Proceedings ArticleDOI
25 Mar 2012
TL;DR: A novel method for object tracking in videos which can find application in eating and drinking activity recognition is proposed, where the query object is detected in the first video frame, extracting a new query image.
Abstract: A novel method for object tracking in videos which can find application in eating and drinking activity recognition is proposed. The query object is detected in the first video frame, extracting a new query image. The initial query image along with the obtained query image are then compared with patches within a determined search region around the position of the detected object in the previous frame. For each image, the local steering kernels are extracted and the similarity between a query image and the patches of the video frame is measured by calculating the cosine similarity. The proposed method finds application in eating and drinking activity recognition.

Proceedings ArticleDOI
03 Dec 2012
TL;DR: A way of using the Audio-Visual Description Profile (AVDP) of the MPEG-7 standard for stereo video content description in such a way that 3D video content can be correctly and consistently described is proposed.
Abstract: In this paper we propose a way of using the Audio-Visual Description Profile (AVDP) of the MPEG-7 standard for stereo video content description. Our aim is to provide means of using AVDP in such a way that 3D video content can be correctly and consistently described. Since, AVDP semantics do not include ways for dealing with 3D video content, a new semantic framework within AVDP is proposed. Finally, we show some examples of using AVDP to describe the results of semantic analysis algorithms on stereo video content.

Proceedings ArticleDOI
25 Mar 2012
TL;DR: A new structure that is based on Anthropos-7 and extends the description from single-view to multi-view multimedia content is proposed and it is shown that the proposed structure can be used to describe stereo, video plus depth andMulti-view video content.
Abstract: In this paper a new framework for multi-view video content is discussed. The latter framework is based on the MPEG-7 description schemes and is an extension of the Anthropos-7 framework. Furthermore, we propose a new structure that is based on Anthropos-7 and extends the description from single-view to multi-view multimedia content. Moreover, we show that the proposed structure can be used to describe stereo, video plus depth and multi-view video content. The aim of this proposal is to achieve better results in the indexing, filtering and retrieval processes of multi-capturing systems in terms of time complexity.

Proceedings Article
18 Oct 2012
TL;DR: A novel appearance-based method for visual object tracking of rigid objects with pose variations and small scale and 2-dimensional rotation changes is proposed.
Abstract: A novel appearance-based method for visual object tracking of rigid objects with pose variations and small scale and 2-dimensional rotation changes is proposed. The algorithm employs a bank of Gabor filters for computing the salient object features, which represent the object model. In each frame, candidate objects of a search region are extracted randomly, following a 2-dimensional Gaussian distribution. The object in the current frame is the candidate object whose cosine similarity to the detected object in the first frame and the object instance in a previous frame where significant change in the object appearance was last observed is maximal.

Proceedings ArticleDOI
03 Dec 2012
TL;DR: Experiments showed that the proposed method is effective in tracking objects under partial occlusion and changes in the object view angle, and can be applied on 3D video content captured by commercial stereo cameras, as well as 3D movies and 3D TV programs.
Abstract: A novel method is proposed for visual object tracking in stereo videos. The algorithm employs Local Steering Kernel features and 2-dimensional color-disparity histograms for object texture description. The proposed framework requires no information about the intrinsic and extrinsic parameters of the stereo camera system. Therefore, it can be applied on 3D video content captured by commercial stereo cameras, as well as 3D movies and 3D TV programs. Experiments showed that the proposed method is effective in tracking objects under partial occlusion and changes in the object view angle.

Proceedings ArticleDOI
12 Nov 2012
TL;DR: A Support Vector Machine (SVM) variant is proposed, which makes use of robust statistics, and investigates the use of statistically robust location and dispersion estimators, in order to enhance the performance of a facial expression recognition algorithm by using the support vector machines.
Abstract: In this paper, a new framework for facial expression recognition is presented. A Support Vector Machine (SVM) variant is proposed, which makes use of robust statistics. We investigate the use of statistically robust location and dispersion estimators, in order to enhance the performance of a facial expression recognition algorithm by using the support vector machines. The efficiency of the proposed method is tested for two-class and multi-class classification problems. In addition to the experiments conducted in facial expression database we also conducted experiments on classification databases to provide evidence that our method outperforms state of the art methods.

Proceedings ArticleDOI
25 Mar 2012
TL;DR: This paper is meant as a proof of concept regarding the application of standard 2D signal representation and feature extraction tools that have wide use in their respective fields to graph related pattern recognition tasks such as clustering.
Abstract: This paper is meant as a proof of concept regarding the application of standard 2D signal representation and feature extraction tools that have wide use in their respective fields to graph related pattern recognition tasks such as, in this case, clustering. By viewing the adjacency matrix of a graph as a 2-dimensional signal, we can apply 2D Discrete Cosine Transform (DCT) to it and use the relation between the adjacency matrix and the values of the DCT bases in order to cluster nodes into strongly connected components. By viewing the adjacency matrices of multiple graphs as feature vectors, we can apply Principal Components Analysis (PCA) to decorrelate them and achieve better clustering performance. Experimental results on synthetic data indicate that there is potential in the use of such techniques to graph analysis.

Proceedings ArticleDOI
01 Sep 2012
TL;DR: A Bayesian inference algorithm is introduced which is used to merge the information of both forward and backward tracking in order to refine the tracked region localization results.
Abstract: In this paper we propose a Bayesian framework for accurate object tracking in stereoscopic sequences Object detection and forward tracking are first combined according to predefined rules to get a first set of tracked regions candidates Backward tracking is then applied to provide another set of possible object localizations Moreover, this strategy is applied herein in stereoscopic video We introduce a Bayesian inference algorithm which is used to merge the information of both forward and backward tracking in order to refine the tracked region localization results Experiments, performed on face tracking, show that the proposed method provides higher tracking accuracy than a forward tracker

Proceedings ArticleDOI
01 Sep 2012
TL;DR: A novel person identification method exploiting human motion information by using their poses during action execution using Fuzzy Vector Quantization and Discriminant Learning is proposed.
Abstract: In this paper we propose a novel person identification method exploiting human motion information. Persons are described by using their poses during action execution. Identification process involves Fuzzy Vector Quantization and Discriminant Learning. In the case of multiple cameras used in the identification phase, single-view identification results combination is achieved by employing a Bayesian combination strategy. The proposed identification approach does not set the assumptions of known action class and number of capturing cameras in the identification phase. Experimental results on two publicly available video databases denote the effectiveness of the proposed approach.