scispace - formally typeset
Search or ask a question
Author

Ioannis Pitas

Other affiliations: University of Bristol, University of York, University of Toronto  ...read more
Bio: Ioannis Pitas is an academic researcher from Aristotle University of Thessaloniki. The author has contributed to research in topics: Facial recognition system & Digital watermarking. The author has an hindex of 76, co-authored 795 publications receiving 24787 citations. Previous affiliations of Ioannis Pitas include University of Bristol & University of York.


Papers
More filters
Proceedings ArticleDOI
16 Apr 2013
TL;DR: Fuzzy Vector Quantization is applied to the human body poses appearing in a video in order to obtain a compact video representation, that will be used for person identification and action recognition.
Abstract: In this paper, we propose a person identification method exploiting human motion information. A Self Organizing Neural Network is employed in order to determine a topographic map of representative human body poses. Fuzzy Vector Quantization is applied to the human body poses appearing in a video in order to obtain a compact video representation, that will be used for person identification and action recognition. Two feedforward Artificial Neural Networks are trained to recognize the person ID and action class labels of a given test action video. Network outputs combination, based on another feedforward network, is performed in the case of multiple cameras used in the training and identification phases. Experimental results on two publicly available databases evaluate the performance of the proposed person identification approach.

12 citations

Book ChapterDOI
15 Sep 2010
TL;DR: Experiments conducted on the XM2VTS database, demonstrate that PCA+CDA outperforms PCA, LDA and PCA-LDA in Cross Validation inside the database and the behavior of these algorithms, when the size of training set decreases, is explored to demonstrate their robustness.
Abstract: In this paper, the problem of frontal view recognition on still images is confronted, using subspace learning methods. The aim is to acquire the frontal images of a person in order to achieve better results in later face or facial expression recognition. For this purpose, we utilize a relatively new subspace learning technique, Clustering based Discriminant Analysis (CDA) against two well-known in the literature subspace learning techniques for dimensionality reduction, Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA). We also concisely describe spectral clustering which is proposed in this work as a preprocessing step to the CDA algorithm. As classifiers, we use the KNearest Neighbor the Nearest Centroid and the novel Nearest Cluster Centroid classifiers. Experiments conducted on the XM2VTS database, demonstrate that PCA+CDA outperforms PCA, LDA and PCA+LDA in Cross Validation inside the database. Finally the behavior of these algorithms, when the size of training set decreases, is explored to demonstrate their robustness.

12 citations

Journal ArticleDOI
TL;DR: A long-term 2D tracking framework for the coverage of live outdoor events that is suitable for embedded system application (e.g., sports) and allows continued target tracking once the target re-appears in the video stream, without tracker re-initialization.
Abstract: This paper presents a long-term 2D tracking framework for the coverage of live outdoor (e.g., sports) events that is suitable for embedded system application (e.g. Unmanned Aerial Vehicles). This application scenario requires 2D target (e.g., athlete, ball, bicycle, boat) tracking for visually assisting the UAV pilot (or cameraman) to maintain proper target framing, or even for actual 3D target following/localization when the drone flies autonomously. In these cases, it should be expected that the target to be tracked/followed, may disappear from the UAV camera field of view, due to fast 3D target motion, illumination changes, or due to visual target occlusions by obstacles, even if the actual UAV continues following it (either autonomously, by exploiting alternative target localization sensors, or by pilot maneuvering). Therefore, the 2D tracker should be able to recover from such situations. The proposed framework solves exactly this problem. Target occlusions are detected from the 2D tracker responses. Depending on the occlusion immensity, the proposed framework decides whether to not update the tracking model, or to employ target re-detection in a broader window. As a result, the proposed framework allows continued target tracking once the target re-appears in the video stream, without tracker re-initialization.

12 citations

Proceedings ArticleDOI
01 Mar 2017
TL;DR: This work presents a method based on selecting as key-frames video frames able to optimally reconstruct the entire video and modelling the reconstruction algebraically as a Column Subset Selection Problem (CSSP) resulting in extracting key- frames that correspond to elementary visual building blocks.
Abstract: Summarization of videos depicting human activities is a timely problem with important applications, e.g., in the domains of surveillance or film/TV production, that steadily becomes more relevant. Research on video summarization has mainly relied on global clustering or local (frame-by-frame) saliency methods to provide automated algorithmic solutions for key-frame extraction. This work presents a method based on selecting as key-frames video frames able to optimally reconstruct the entire video. The novelty lies in modelling the reconstruction algebraically as a Column Subset Selection Problem (CSSP), resulting in extracting key-frames that correspond to elementary visual building blocks. The problem is formulated under an optimization framework and approximately solved via a genetic algorithm. The proposed video summarization method is being evaluated using a publicly available annotated dataset and an objective evaluation metric. According to the quantitative results, it clearly outperforms the typical clustering approach.

12 citations

Proceedings ArticleDOI
04 Sep 2006
TL;DR: It is argued that the large deviation and increased values of the number of pixels with low intensities that the mouth region of a speaking person demonstrates can be used as visual cues for detecting speech.
Abstract: In recent research efforts, the integration of visual cues into speech analysis systems has been proposed with favorable response. This paper introduces a novel approach for lip activity and visual speech detection. We argue that the large deviation and increased values of the number of pixels with low intensities that the mouth region of a speaking person demonstrates can be used as visual cues for detecting speech. We describe a statistical algorithm, based on detection theory, for the efficient characterization of speaking and silent intervals in video sequences. The proposed system has been tested into a number of video sequences with encouraging experimental results. Potential applications include speech intent detection, speaker determination and semantic video annotation.

12 citations


Cited by
More filters
Journal ArticleDOI

[...]

08 Dec 2001-BMJ
TL;DR: There is, I think, something ethereal about i —the square root of minus one, which seems an odd beast at that time—an intruder hovering on the edge of reality.
Abstract: There is, I think, something ethereal about i —the square root of minus one. I remember first hearing about it at school. It seemed an odd beast at that time—an intruder hovering on the edge of reality. Usually familiarity dulls this sense of the bizarre, but in the case of i it was the reverse: over the years the sense of its surreal nature intensified. It seemed that it was impossible to write mathematics that described the real world in …

33,785 citations

Journal ArticleDOI
TL;DR: In this paper, the authors provide an up-to-date critical survey of still-and video-based face recognition research, and provide some insights into the studies of machine recognition of faces.
Abstract: As one of the most successful applications of image analysis and understanding, face recognition has recently received significant attention, especially during the past several years. At least two reasons account for this trend: the first is the wide range of commercial and law enforcement applications, and the second is the availability of feasible technologies after 30 years of research. Even though current machine recognition systems have reached a certain level of maturity, their success is limited by the conditions imposed by many real applications. For example, recognition of face images acquired in an outdoor environment with changes in illumination and/or pose remains a largely unsolved problem. In other words, current systems are still far away from the capability of the human perception system.This paper provides an up-to-date critical survey of still- and video-based face recognition research. There are two underlying motivations for us to write this survey paper: the first is to provide an up-to-date review of the existing literature, and the second is to offer some insights into the studies of machine recognition of faces. To provide a comprehensive survey, we not only categorize existing recognition techniques but also present detailed descriptions of representative methods within each category. In addition, relevant topics such as psychophysical studies, system evaluation, and issues of illumination and pose variation are covered.

6,384 citations

Journal ArticleDOI
TL;DR: In this article, the authors categorize and evaluate face detection algorithms and discuss relevant issues such as data collection, evaluation metrics and benchmarking, and conclude with several promising directions for future research.
Abstract: Images containing faces are essential to intelligent vision-based human-computer interaction, and research efforts in face processing include face recognition, face tracking, pose estimation and expression recognition. However, many reported methods assume that the faces in an image or an image sequence have been identified and localized. To build fully automated systems that analyze the information contained in face images, robust and efficient face detection algorithms are required. Given a single image, the goal of face detection is to identify all image regions which contain a face, regardless of its 3D position, orientation and lighting conditions. Such a problem is challenging because faces are non-rigid and have a high degree of variability in size, shape, color and texture. Numerous techniques have been developed to detect faces in a single image, and the purpose of this paper is to categorize and evaluate these algorithms. We also discuss relevant issues such as data collection, evaluation metrics and benchmarking. After analyzing these algorithms and identifying their limitations, we conclude with several promising directions for future research.

3,894 citations