scispace - formally typeset
Search or ask a question

Showing papers by "Ioannis Pitas published in 2009"


Proceedings ArticleDOI
12 Nov 2009
TL;DR: The database has been created using a convergent eight camera setup to produce high definition multi-view videos, where each video depicts one of eight persons performing one of twelve different human motions.
Abstract: In this paper a new multi-view/3D human action/interaction database is presented. The database has been created using a convergent eight camera setup to produce high definition multi-view videos, where each video depicts one of eight persons performing one of twelve different human motions. Various types of motions have been recorded, i.e., scenes where one person performs a specific movement, scenes where a person executes different movements in a succession and scenes where two persons interact with each other. Moreover, the subjects have different body sizes, clothing and are of different sex, nationalities, etc.. The multi-view videos have been further processed to produce a 3D mesh at each frame describing the respective 3D human body surface. To increase the applicability of the database, for each person a multi-view video depicting the person performing sequentially the six basic facial expressions separated by the neutral expression has also been recorded. The database is freely available for research purposes.

180 citations


Journal ArticleDOI
TL;DR: This paper presents a new approach for the segmentation of color textured images, which is based on a novel energy function, derived by exploiting an intermediate step of modal analysis that is utilized in order to describe and analyze the deformations of a 3-D deformable surface model.
Abstract: This paper presents a new approach for the segmentation of color textured images, which is based on a novel energy function. The proposed energy function, which expresses the local smoothness of an image area, is derived by exploiting an intermediate step of modal analysis that is utilized in order to describe and analyze the deformations of a 3-D deformable surface model. The external forces that attract the 3-D deformable surface model combine the intensity of the image pixels with the spatial information of local image regions. The proposed image segmentation algorithm has two steps. First, a color quantization scheme, which is based on the node displacements of the deformable surface model, is utilized in order to decrease the number of colors in the image. Then, the proposed energy function is used as a criterion for a region growing algorithm. The final segmentation of the image is derived by a region merge approach. The proposed method was applied to the Berkeley segmentation database. The obtained results show good segmentation robustness, when compared to other state of the art image segmentation algorithms.

67 citations


Journal ArticleDOI
TL;DR: A novel method for eye and mouth detection and eye center and mouth corner localization, based on geometrical information, which has been tested on the XM2VTS and BioID databases, with very good results.

65 citations


Journal ArticleDOI
TL;DR: It is argued that the increased average value and standard deviation of the number of pixels with low intensities that the mouth region of a speaking person demonstrates can be used as visual cues for detecting visual speech.
Abstract: In this letter, we introduce a novel approach for lip activity detection and speaker detection, using solely visual information. The main idea in this work is to apply signal detection algorithms to a simple and easily extracted feature from the mouth region. We argue that the increased average value and standard deviation of the number of pixels with low intensities that the mouth region of a speaking person demonstrates can be used as visual cues for detecting visual speech. We then proceed in deriving a statistical algorithm that utilizes this fact for the efficient characterization of visual speech and silence in video sequences. Furthermore, we employ the lip activity detection method in order to determine the active speaker(s) in a multi-person environment.

47 citations


Journal ArticleDOI
TL;DR: A novel class of multiclass classifiers inspired by the optimization of Fisher discriminant ratio and the support vector machine (SVM) formulation is introduced, which is applied to face recognition and other classification problems using Mercer's kernels.
Abstract: In this paper, a novel class of multiclass classifiers inspired by the optimization of Fisher discriminant ratio and the support vector machine (SVM) formulation is introduced. The optimization problem of the so-called minimum within-class variance multiclass classifiers (MWCVMC) is formulated and solved in arbitrary Hilbert spaces, defined by Mercer's kernels, in order to find multiclass decision hyperplanes/surfaces. Afterwards, MWCVMCs are solved using indefinite kernels and dissimilarity measures via pseudo-Euclidean embedding. The power of the proposed approach is first demonstrated in the facial expression recognition of the seven basic facial expressions (i.e., anger, disgust, fear, happiness, sadness, and surprise plus the neutral state) problem in the presence of partial facial occlusion by using a pseudo-Euclidean embedding of Hausdorff distances and the MWCVMC. The experiments indicated a recognition accuracy rate achieved up to 99%. The MWCVMC classifiers are also applied to face recognition and other classification problems using Mercer's kernels.

33 citations


Proceedings ArticleDOI
28 Jun 2009
TL;DR: Evaluation of the proposed algorithm for view independent human movement representation and recognition, exploiting the rich information contained in multi-view videos, shows that it is particularly efficient and robust, and can achieve good recognition performance.
Abstract: In this paper a novel method for view independent human movement representation and recognition, exploiting the rich information contained in multi-view videos, is proposed. The binary masks of a multi-view posture image are first vectorized, concatenated and the view correspondence problem between train and test samples is solved using the circular shift invariance property of the discrete Fourier transform (DFT) magnitudes. Then, using fuzzy vector quantization (FVQ) and linear discriminant analysis (LDA), different movements are represented and classified. This method allows view independent movement recognition, without the use of calibrated cameras, a-priori view correspondence information or 3D model reconstruction. A multiview video database has been constructed for the assessment of the proposed algorithm. Evaluation of this algorithm on the new database, shows that it is particularly efficient and robust, and can achieve good recognition performance.

33 citations


Journal ArticleDOI
TL;DR: Whether features having super- and sub-Gaussian distribution facilitate facial expression classification and whether a nonlinear mixture of independent sources improves the classification accuracy are addressed; and whether an increased “amount” of sparseness yields more accurate facial expression recognition.
Abstract: Independent component analysis (ICA) and Gabor wavelets extract the most discriminating features for facial action unit classification by employing either a cosine similarity measure (CSM) classifier or support vector machines (SVMs). So far, only the ICA approach, which is based on the InfoMax principle, has been tested for facial expression recognition. In this paper, in addition to the InfoMax approach, another five ICA approaches extract features from two facial expression databases. In particular, the Extended InfoMax ICA, the undercomplete ICA, and the nonlinear kernel-ICA approaches are exploited for facial expression representation for the first time. When applied to images, ICA treats the images as being mixtures of independent sources and decomposes them into an independent basis and the corresponding mixture coefficients. Two architectures for representing the images can be employed yielding either independent and sparse basis images or independent and sparse distributions of image representation coefficients. After feature extraction, facial expression classification is performed with the help of either a CSM classifier or an SVM classifier. A detailed comparative study is made with respect to the accuracy offered by each classifier. The correlation between the accuracy and the mutual information of independent components or the kurtosis is evaluated. Statistically significant correlations between the aforementioned quantities are identified. Several issues are addressed in the paper: (i) whether features having super- and sub-Gaussian distribution facilitate facial expression classification; (ii) whether a nonlinear mixture of independent sources improves the classification accuracy; and (iii) whether an increased “amount” of sparseness yields more accurate facial expression recognition. In addition, performance enhancements by employing leave-one-set of expressions-out and subspace selection are studied. Statistically significant differences in accuracy between classifiers using several feature extraction methods are also indicated.

25 citations


Journal ArticleDOI
TL;DR: A novel approach for estimating 3-D head pose in single-view video sequences using a feature vector which is a byproduct of the equations that govern the deformation of the surface model used in the tracking.
Abstract: This paper presents a novel approach for estimating 3-D head pose in single-view video sequences. Following initialization by a face detector, a tracking technique that utilizes a 3-D deformable surface model to approximate the facial image intensity is used to track the face in the video sequence. Head pose estimation is performed by using a feature vector which is a byproduct of the equations that govern the deformation of the surface model used in the tracking. The afore-mentioned vector is used as input in a radial basis function interpolation network in order to estimate the 3-D head pose. The proposed method was applied to IDIAP head pose estimation database. The obtained results show that the method can estimate the head direction vector with very good accuracy.

25 citations


Book ChapterDOI
01 Jan 2009
TL;DR: This chapter reviews the main application domains of watermarking and the principles and techniques devised for two major application areas, namely copyright protection and authentication are discussed.
Abstract: Publisher Summary Watermarking is the practice of imperceptibly altering a piece of data in order to embed information about the data. According to the definition there are two important characteristics of watermarking. First, information embedding should not cause perceptible changes to the host medium. Second, the message should be related to the host medium. In this sense, the watermarking techniques form a subset of information hiding techniques, which also include cases where the hidden information is not related to the host medium. This chapter reviews the main application domains of watermarking. Properties and classification schemes of watermarking techniques are presented followed by the basic functional modules of a watermarking scheme. Further, the principles and techniques devised for two major application areas, namely copyright protection and authentication are discussed.

20 citations


Proceedings ArticleDOI
07 Nov 2009
TL;DR: Using a publicly available database, promising results are provided regarding the discrimination power of the different movements for the human identification task, as well as it is indicated that the combination of the individual classifiers may increase the robustness of the human recognition algorithm.
Abstract: In this paper a multi-modal method for human identification that exploits the discrimination power of several movement types performed from the same human is proposed. Utilizing a fuzzy vector quantization (FVQ) and linear discriminant analysis (LDA) based algorithm, an unknown movement is first classified, and, then, the person performing the movement is recognized from a movement specific person classifier. In case that the unknown person performs more than one movements, a multi-modal algorithm combines the results of the individual classifiers to yield the final decision for the id of the unknown human. Using a publicly available database, we provide promising results regarding the discrimination power of the different movements for the human identification task, as well as we indicate that the combination of the individual classifiers may increase the robustness of the human recognition algorithm.

19 citations


Proceedings ArticleDOI
07 Nov 2009
TL;DR: The Candide facial grid is utilized and Principal Components Analysis (PCA) is applied to find the two eigenvectors of the model vertices to define a new coordinate system where vertices are mapped.
Abstract: In this paper, we propose a new method for facial expression recognition. We utilize the Candide facial grid and apply Principal Components Analysis (PCA) to find the two eigenvectors of the model vertices. These eigenvectors along with the barycenter of the vertices are used to define a new coordinate system where vertices are mapped. Support Vector Machines (SVMs) are then used for the facial expression classification task. The method is invariant to in-plane translation and rotation as well as scaling of the face and achieves very satisfactory results.

Book ChapterDOI
01 Jan 2009
TL;DR: Theoretical and experimental evidence suggest that the HVS performs face analysis in a structured and hierarchical way, where both representations have their own contribution and goal.
Abstract: Two main theories exist with respect to face encoding and representation in the human visual system (HVS). The first one refers to the dense (holistic) representation of the face, where faces have “holon”-like appearance. The second one claims that a more appropriate face representation is given by a sparse code, where only a small fraction of the neural cells corresponding to face encoding is activated. Theoretical and experimental evidence suggest that the HVS performs face analysis (encoding, storing, face recognition, facial expression recognition) in a structured and hierarchical way, where both representations have their own contribution and goal. According to neuropsychological experiments, it seems that encoding for face recognition, relies on holistic image representation, while a sparse image representation is used for facial expression analysis and classification. From the computer vision perspective, the techniques developed for automatic face and facial expression recognition fall into the same two representation types. Like in Neuroscience, the techniques which perform better for face recognition yield a holistic image representation, while those techniques suitable for facial expression recognition use a sparse or local image representation. The proposed mathematical models of image formation and encoding try to simulate the efficient storing, organization and coding of data in the human cortex. This is equivalent with embedding constraints in the model design regarding dimensionality reduction, redundant information minimization, mutual information minimization, non-negativity constraints, class information, etc. The presented techniques are applied as a feature extraction step followed by a classification method, which also heavily influences the recognition results.



Journal ArticleDOI
TL;DR: A fast algorithm has been developed to quickly and robustly perform these two tasks on very large video databases and the suitability of the proposed approach for very large databases was tested using (artificial) data corresponding to hundreds or thousands of hours of video.
Abstract: The management of large video databases, especially those containing motion picture and television data, is a major contemporary challenge. A very significant tool for this management is the ability to retrieve those segments that are perceptually similar to a query segment. Another similar but equally important task is determining if a query segment is a (possibly modified) copy of part of a video in the database. The basic way to perform these two tasks is to characterize each video segment with a unique representation called a signature. Using semantic information for the construction of the signatures is a good way to ensure robustness in retrieval and fingerprinting. Here a ubiquitous semantic feature, namely the existence and identity of human faces, will be used to construct the signature. A fast algorithm has been developed to quickly and robustly perform these two tasks on very large video databases. The prerequisite face recognition was performed by a commercial system. Having verified the basic efficacy of our algorithm on a database of real video from motion pictures and television series, we then proceed to further explore its performance in an artificial digital video database, which was created using a probabilistic model of the video creation process. This enabled us to explore variations in performance based on parameters that were impossible to control in a real video database. Furthermore, the suitability of the proposed approach for very large databases was tested using (artificial) data corresponding to hundreds or thousands of hours of video.

Proceedings ArticleDOI
28 Jun 2009
TL;DR: A novel method is proposed as a solution to the problem of frontal view recognition from multiview image sequences to correctly identify the view that corresponds to the camera placed in front of a person, or the camera whose view is closer to a frontal one.
Abstract: In this paper, a novel method is proposed as a solution to the problem of frontal view recognition from multiview image sequences. Our aim is to correctly identify the view that corresponds to the camera placed in front of a person, or the camera whose view is closer to a frontal one. By doing so, frontal face images of the person can be acquired, in order to be used in face or facial expression recognition techniques that require frontal faces to achieve a satisfactory result. The proposed method firstly employs the Discriminant Non-Negative Matrix Factorization (DNMF) algorithm on the input images acquired from every camera. The output of the algorithm is then used as an input to a Support Vector Machines (SVMs) system that classifies the head poses acquired from the cameras to two classes that correspond to the frontal or non frontal pose. Experiments conducted on the IDIAP database demonstrate that the proposed method achieves an accuracy of 98.6% in frontal view recognition.

Proceedings ArticleDOI
08 Jul 2009
TL;DR: The proposed method uses a novel pose estimation algorithm based on mutual information to extract any required facial pose from video sequences and outperforms a PCA reconstruction method which was used as a benchmark.
Abstract: Estimation of the facial pose in video sequences is one of the major issues in many vision systems such as face based biometrics, scene understanding for human and others. The proposed method uses a novel pose estimation algorithm based on mutual information to extract any required facial pose from video sequences. The method extracts the poses automatically and classifies them according to view angle. Experimental results on the XM2VTS video database indicated a pose classification rate of 99.2% while it was shown that it outperforms a PCA reconstruction method which was used as a benchmark.

Journal ArticleDOI
TL;DR: This work proposes a new technique that combines localization and invertibility in fragile watermarking, and guarantees that no malicious attacks will manage to create information leaks.
Abstract: Fragile watermarking is a popular method for image authentication. In such schemes, a fragile signal that is sensitive to manipulations is embedded in the image, so that it becomes undetectable after any modification of the original work. Most algorithms focus either on the ability to retrieve the original work after watermark detection (invertibility) or on detecting which image parts have been altered (localization). Furthermore, the majority of fragile watermarking schemes suffer from robustness flaws. We propose a new technique that combines localization and invertibility. Moreover, watermark dependency on the original image and the non-linear watermark embedding procedure guarantees that no malicious attacks will manage to create information leaks.

Proceedings ArticleDOI
28 Jun 2009
TL;DR: This paper employs a generative probabilistic model, namely Latent Dirichlet Allocation (LDA), so as to capture latent aspects of a video, using facial semantic information derived from the video, to develop a fingerprinting (replica detection) framework.
Abstract: This paper investigates the possibility of extracting latent aspects of a video, using visual information about humans (e.g. actors' faces), in order to develop a fingerprinting (replica detection) framework. We employ a generative probabilistic model, namely Latent Dirichlet Allocation (LDA), so as to capture latent aspects of a video, using facial semantic information derived from the video. We use the bag-of-words concept, (bag-of-faces in our case) in order to ensure exchangeability of the latent variables (e.g. topics). The video topics are modeled as a mixture of distributions of faces in each video. This generative probabilistic model has already been used in the case of text modeling with good results. Experimental results provide evidence that the proposed method performs very efficiently for video fingerprinting.

Proceedings ArticleDOI
23 Oct 2009
TL;DR: This paper presents a novel approach for estimating 3D head pose in single-view video sequences acquired by an uncalibrated camera and demonstrates that the method can estimate the head pose with satisfying accuracy.
Abstract: This paper presents a novel approach for estimating 3D head pose in single-view video sequences acquired by an uncalibrated camera. Following the initialization by a face detector, a tracking technique localizes the faces in each frame in the video sequence. Head pose estimation is performed by using a structure from motion and self-calibration technique in a sequential way. The proposed method was applied to the IDIAP database that contains head pose ground truth data. The obtained results demonstrate that the method can estimate the head pose with satisfying accuracy.

Proceedings ArticleDOI
04 Feb 2009
TL;DR: Two fingerprinting approaches are reviewed in this paper: an image fingerprinting technique that makes use of color and texture descriptors,R-trees and Linear Discriminant Analysis (LDA), and a two-step, coarse-to-fine video fingerprinting method that involves color-based descriptors, R-Trees and a frame-based voting procedure.
Abstract: Multimedia fingerprinting, also know as robust/perceptual hashing and replica detection is an emerging technology that can be used as an alternative to watermarking for the efficient Digital Rights Management (DRM) of multimedia data. Two fingerprinting approaches are reviewed in this paper. The first is an image fingerprinting technique that makes use of color and texture descriptors,R-trees and Linear Discriminant Analysis (LDA). The second is a two-step, coarse-to-fine video fingerprinting method that involves color-based descriptors, R-trees and a frame-based voting procedure. Experimental performance evaluation is provided for both methods.

Proceedings ArticleDOI
23 Oct 2009
TL;DR: Experimental results show that the proposed methodology provides a good solution to the facial expression recognition problem.
Abstract: This paper presents a novel facial expression recognition methodology. In order to classify the expression of a test face to one of seven pre-determined facial expression classes, multiple two-class classification tasks are carried out. For each such task, a unique set of features is identified that is enhanced, in terms of its ability to help produce a proper separation between the two specific classes. The selection of these sets of features is accomplished by making use of a class separability measure that is utilized in an iterative process. Fisher's linear discriminant is employed in order to produce the separation between each pair of classes and train each two-class classifier. In order to combine the classification results from all two-class classifiers, the ‘voting’ classifier-decision fusion process is employed. The standard JAFFE database is utilized in order to evaluate the performance of this algorithm. Experimental results show that the proposed methodology provides a good solution to the facial expression recognition problem.


Book ChapterDOI
16 Sep 2009
TL;DR: Experimental results show that the proposed methodology provides a good solution to the facial expression recognition problem.
Abstract: This paper presents a novel facial expression recognition methodology. In order to classify the expression of a test face to one of seven predetermined facial expression classes, multiple two-class classification tasks are carried out. For each such task, a unique set of features is identified that is enhanced, in terms of its ability to help produce a proper separation between the two specific classes. The selection of these sets of features is accomplished by making use of a class separability measure that is utilized in an iterative process. Fisher's linear discriminant is employed in order to produce the separation between each pair of classes and train each two-class classifier. In order to combine the classification results from all two-class classifiers, the 'voting' classifier-decision fusion process is employed. The standard JAFFE database is utilized in order to evaluate the performance of this algorithm. Experimental results show that the proposed methodology provides a good solution to the facial expression recognition problem.

Book ChapterDOI
16 Sep 2009
TL;DR: Using a publicly available database, promising results are provided regarding the human identification strength of movement specific experts, as well as it is indicated that the combination of the outputs of the experts increases the robustness of the human recognition algorithm.
Abstract: In this paper a multi-modal method for human identification that exploits the discriminant features derived from several movement types performed from the same human is proposed. Utilizing a fuzzy vector quantization (FVQ) and linear discriminant analysis (LDA) based algorithm, an unknown movement is first classified, and, then, the person performing the movement is recognized from a movement specific person recognition expert. In case that the unknown person performs more than one movements, a multi-modal algorithm combines the scores of the individual experts to yield the final decision for the identity of the unknown human. Using a publicly available database, we provide promising results regarding the human identification strength of movement specific experts, as well as we indicate that the combination of the outputs of the experts increases the robustness of the human recognition algorithm.