scispace - formally typeset
Search or ask a question

Showing papers by "Ioannis Pitas published in 2007"


Journal ArticleDOI
TL;DR: Two novel methods for facial expression recognition in facial image sequences are presented, one based on deformable models and the other based on grid-tracking and deformation systems.
Abstract: In this paper, two novel methods for facial expression recognition in facial image sequences are presented. The user has to manually place some of Candide grid nodes to face landmarks depicted at the first frame of the image sequence under examination. The grid-tracking and deformation system used, based on deformable models, tracks the grid in consecutive video frames over time, as the facial expression evolves, until the frame that corresponds to the greatest facial expression intensity. The geometrical displacement of certain selected Candide nodes, defined as the difference of the node coordinates between the first and the greatest facial expression intensity frame, is used as an input to a novel multiclass Support Vector Machine (SVM) system of classifiers that are used to recognize either the six basic facial expressions or a set of chosen Facial Action Units (FAUs). The results on the Cohn-Kanade database show a recognition accuracy of 99.7% for facial expression recognition using the proposed multiclass SVMs and 95.1% for facial expression recognition based on FAU detection

676 citations


Journal ArticleDOI
TL;DR: A novel DNMF method that uses projected gradients is presented that employs some extra modifications that make the method more suitable for classification tasks.
Abstract: The methods introduced so far regarding discriminant non-negative matrix factorization (DNMF) do not guarantee convergence to a stationary limit point. In order to remedy this limitation, a novel DNMF method is presented that uses projected gradients. The proposed algorithm employs some extra modifications that make the method more suitable for classification tasks. The usefulness of the proposed technique to frontal face verification and facial expression recognition problems is demonstrated.

106 citations


Journal ArticleDOI
TL;DR: A novel algorithm that can be used to boost the performance of face-verification methods that utilize Fisher's criterion is presented and evaluated and Experimental results indicate that the proposed framework greatly improves the face- Verification performance.
Abstract: A novel algorithm that can be used to boost the performance of face-verification methods that utilize Fisher's criterion is presented and evaluated. The algorithm is applied to similarity, or matching error, data and provides a general solution for overcoming the "small sample size" (SSS) problem, where the lack of sufficient training samples causes improper estimation of a linear separation hyperplane between the classes. Two independent phases constitute the proposed method. Initially, a set of weighted piecewise discriminant hyperplanes are used in order to provide a more accurate discriminant decision than the one produced by the traditional linear discriminant analysis (LDA) methodology. The expected classification ability of this method is investigated throughout a series of simulations. The second phase defines proper combinations for person-specific similarity scores and describes an outlier removal process that further enhances the classification ability. The proposed technique has been tested on the M2VTS and XM2VTS frontal face databases. Experimental results indicate that the proposed framework greatly improves the face-verification performance

90 citations


Journal ArticleDOI
TL;DR: A latent semantic indexing classifier that combines link analysis with text content in order to retrieve and index domain-specific web documents and is compared with other well-known web information retrieval techniques.

84 citations


Journal ArticleDOI
TL;DR: The effectiveness of the proposed approach is demonstrated by comparing it with the standard SVMs and other classifiers, like kernel Fisher discriminant analysis in facial image characterization problems like gender determination, eyeglass, and neutral facial expression detection.
Abstract: In this paper, a modified class of support vector machines (SVMs) inspired from the optimization of Fisher's discriminant ratio is presented, the so-called minimum class variance SVMs (MCVSVMs). The MCVSVMs optimization problem is solved in cases in which the training set contains less samples that the dimensionality of the training vectors using dimensionality reduction through principal component analysis (PCA). Afterward, the MCVSVMs are extended in order to find nonlinear decision surfaces by solving the optimization problem in arbitrary Hilbert spaces defined by Mercer's kernels. In that case, it is shown that, under kernel PCA, the nonlinear optimization problem is transformed into an equivalent linear MCVSVMs problem. The effectiveness of the proposed approach is demonstrated by comparing it with the standard SVMs and other classifiers, like kernel Fisher discriminant analysis in facial image characterization problems like gender determination, eyeglass, and neutral facial expression detection.

80 citations


Journal ArticleDOI
TL;DR: Novel nonlinear subspace methods for face verification that outperform other commonly used kernel approaches such as kernel-PCA, kernel direct discriminant analysis, and other two-class and multiclass variants of kernel-discriminant analysis based on Fisher's criterion are proposed.
Abstract: In this paper, novel nonlinear subspace methods for face verification are proposed. The problem of face verification is considered as a two-class problem (genuine versus impostor class). The typical Fisher's linear discriminant analysis (FLDA) gives only one or two projections in a two-class problem. This is a very strict limitation to the search of discriminant dimensions. As for the FLDA for N class problems (N is greater than two), the transformation is not person specific. In order to remedy these limitations of FLDA, exploit the individuality of human faces and take into consideration the fact that the distribution of facial images, under different viewpoints, illumination variations, and facial expression is highly complex and nonlinear, novel kernel-discriminant algorithms are proposed. The new methods are tested in the face verification problem using the XM2VTS, AR, ORL, Yale, and UMIST databases where it is verified that they outperform other commonly used kernel approaches such as kernel-PCA (KPCA), kernel direct discriminant analysis (KDDA), complete kernel Fisher's discriminant analysis (CKFDA), the two-class KDDA, CKFDA, and other two-class and multiclass variants of kernel-discriminant analysis based on Fisher's criterion.

55 citations


Journal ArticleDOI
TL;DR: A comprehensive review of a number of (semi‐) automated FISH and IHC image processing systems, focusing on the algorithmic aspects of each technique, verifies the increasingly important role of such methods in FISHand IHC.
Abstract: Fluorescent in-situ hybridization (FISH) and immunohistochemistry (IHC) constitute a pair of complimentary techniques for detecting gene amplification and overexpression, respectively. The advantages of IHC include relatively cheap materials and high sample durability, while FISH is the more accurate and reproducible method. Evaluation of FISH and IHC images is still largely performed manually, with automated or semiautomated techniques increasing in popularity. Here, we provide a comprehensive review of a number of (semi-) automated FISH and IHC image processing systems, focusing on the algorithmic aspects of each technique. Our review verifies the increasingly important role of such methods in FISH and IHC; however, manual intervention is still necessary in order to resolve particularly challenging or ambiguous cases. In addition, large-scale validation is required in order for these systems to enter standard clinical practice.

52 citations


Proceedings ArticleDOI
01 Apr 2007
TL;DR: An algorithm to cluster face images found in video sequences is proposed, and a novel method for creating a dissimilarity matrix using SIFT image features is introduced, which yields the clustering result.
Abstract: In this paper an algorithm to cluster face images found in video sequences is proposed. A novel method for creating a dissimilarity matrix using SIFT image features is introduced. This dissimilarity matrix is used as an input in a hierarchical average linkage clustering algorithm, which yields the clustering result. Three well known clustering validity measures are provided to asses the quality of the resulting clustering, namely the F measure, the overall entropy (OE) and the Gamma statistic. The final result is found to be quite robust to significant scale, pose and illumination variations, encountered in facial images

39 citations


Journal ArticleDOI
TL;DR: An enhanced set of eigen-audioframes is created that is related to an audio signal subspace, where audio background changes are easily discovered, and a novel process is developed in order to detect audio scene change candidates in this subspace.
Abstract: In this paper, a novel audio-visual scene change detection algorithm is presented and evaluated experimentally. An enhanced set of eigen-audioframes is created that is related to an audio signal subspace, where audio background changes are easily discovered. An analysis is presented that justifies why this subspace favors scene change detection. Additionally, a novel process is developed in order to detect audio scene change candidates in this subspace. Visual information is used to align audio scene change indications with neighboring video shot changes and, accordingly, to reduce the false alarm rate of the audio-only scene change detection. Moreover, video fade effects are identified and used independently in order to track scene changes. The false alarm rate is reduced further by extracting acoustic features in order to verify that the scene change indications are valid. The detection methodology was tested on newscast videos provided by the TRECVID2003 video test set. The experimental results demonstrate that the proposed method achieves an F-measure exceeding 0.85. Accordingly, it effectively tackles the scene change detection problem

24 citations


Journal ArticleDOI
TL;DR: A novel algorithm for finding discriminant person-specific facial models is proposed and tested for frontal face verification and significantly enhances the performance of elastic graph matching in frontal face verify.
Abstract: In this paper, a novel algorithm for finding discriminant person-specific facial models is proposed and tested for frontal face verification. The most discriminant features of a person's face are found and a deformable model is placed in the spatial coordinates that correspond to these discriminant features. The discriminant deformable models, for verifying the person's identity, that are learned through this procedure are elastic graphs that are dense in the facial areas considered discriminant for a specific person and sparse in other less significant facial areas. The discriminant graphs are enhanced by a discriminant feature selection method for the graph nodes in order to find the most discriminant jet features. The proposed approach significantly enhances the performance of elastic graph matching in frontal face verification

22 citations


Journal ArticleDOI
01 Jul 2007-Proteins
TL;DR: The results could be useful for providing better insight to functional importance of metal‐coordinating residues, possibly aiding metal binding site prediction and design, metal‐protein complex structure prediction, drug discovery, as well as model fitting to electron‐density maps produced by X‐ray crystallography.
Abstract: As a result of rapid advances in genome sequencing, the pace of discovery of new protein sequences has surpassed that of structure and function determination by orders of magnitude. This is also true for metal-binding proteins, that is, proteins that bind one or more metal atoms necessary for their biological function. While metal binding site geometry and composition have been extensively studied, no large scale investigation of metal-coordinating residue conservation has been pursued so far. In pursuing this analysis, we were able to corroborate anecdotal evidence that certain residues are preferred to others for binding to certain metals. The conservation of most metal-coordinating residues is correlated with residue preference in a statistically significant manner. Additionally, we also established a statistically significant difference in conservation between metal-coordinating and noncoordinating residues. These results could be useful for providing better insight to functional importance of metal-coordinating residues, possibly aiding metal binding site prediction and design, metal-protein complex structure prediction, drug discovery, as well as model fitting to electron-density maps produced by X-ray crystallography.

Journal ArticleDOI
TL;DR: A novel algorithm for an optimal reduction of object description for object matching purposes by considering simplified objects, thus reducing the number of pixels involved in the matching process is proposed and experimental results are presented.
Abstract: This paper proposes a novel algorithm for an optimal reduction of object description for object matching purposes. Our aim is to decrease the computation needs by considering simplified objects, thus reducing the number of pixels involved in the matching process. We develop the appropriate theoretical background based on centroidal Voronoi tessellations. Its use within the chamfer matching framework is also discussed. We present experimental results regarding the performance of this approach for 2-D contour and region-like object matching. As a special case, we investigate how the snake based representation of target objects can be employed in chamfer matching. The experimental results concern the use of object part matching for recognizing humans and show how the proposed simplification leads to valid replacements of the original templates.

Proceedings ArticleDOI
01 Feb 2007
TL;DR: This paper aims at providing a quantitative description of shot types commonly used in movie productions by generating a database and testing the proposed approach on the set of samples providing promising results.
Abstract: This paper aims at providing a quantitative description of shot types commonly used in movie productions. Only qualitative descriptions are available in the literature and even these are subject to various naming conventions. A vocabulary is fixed and human body-based rules are defined to extract the shot types. A database was generated with a set of samples labeled by cinematography experts. The proposed approach was tested on the set of samples providing promising results.

Journal ArticleDOI
TL;DR: The proposed method exploits the individuality of the human face and the discriminant information of elastic graphs matching in order to improve the verification performance of elastic graph matching.

Proceedings ArticleDOI
15 Apr 2007
TL;DR: A novel class of support vector machines (SVM) is introduced to deal with facial expression recognition, and the proposed classifier incorporates statistic information about the classes under examination into the classical SVM.
Abstract: In this paper, a novel class of support vector machines (SVM) is introduced to deal with facial expression recognition. The proposed classifier incorporates statistic information about the classes under examination into the classical SVM. The developed system performs facial expression recognition in facial videos. The grid tracking and deformation algorithm used tracks the Candide grid over time as the facial expression evolves, until the frame that corresponds to the greatest facial expression intensity. The geometrical displacement of Candide nodes is used as an input to the bank of novel SVM classifiers, that are utilized to recognize the six basic facial expressions. The experiments on the Cohn-Kanade database show a recognition accuracy of 98.2%.

Proceedings Article
01 Jan 2007
TL;DR: A novel method for eye and mouth detection and eye center and mouth corner localization, based on geometrical information is presented and can work efficiently on low-resolution images and has been tested on the XM2VTS database with very good results.
Abstract: In this paper, a novel method for eye and mouth detection and eye center and mouth corner localization, based on geometrical information is presented. First, a face detector is applied to detect the facial region, and the edge map of this region is extracted. A vector pointing to the closest edge pixel is then assigned to every pixel. x and y components of these vectors are used to detect the eyes and mouth. For eye center localization, intensity information is used, after removing unwanted effects, such as light reflections. For the detection of the mouth corners, the hue channel of the lip area is used. The proposed method can work efficiently on low-resolution images and has been tested on the XM2VTS database with very good results.

Journal ArticleDOI
TL;DR: A novel framework for audio-assisted dialogue detection based on indicator functions and neural networks is investigated, using ground-truth indicator functions determined by human observers on six different movies to validate the feasibility of the approach.

Journal ArticleDOI
TL;DR: The proposed method was experimentally shown to be more precise and robust than both KLT and SIFT tracking, and the feature-point selection scheme was tested against the SIFT and Harris feature points, and it was demonstrated to provide superior results.
Abstract: This paper presents a novel approach for selecting and tracking feature points in video sequences. In this approach, the image intensity is represented by a 3-D deformable surface model. The proposed approach relies on selecting and tracking feature points by exploiting the so-called generalized displacement vector that appears in the explicit surface deformation governing equations. This vector is proven to be a combination of the output of various line- and edge-detection masks, thus leading to distinct, robust features. The proposed method was compared, in terms of tracking accuracy and robustness, with a well-known tracking algorithm, Kanade-Lucas-Tomasi (KLT), and a tracking algorithm based on scale-invariant feature transform (SIFT) features. The proposed method was experimentally shown to be more precise and robust than both KLT and SIFT tracking. Moreover, the feature-point selection scheme was tested against the SIFT and Harris feature points, and it was demonstrated to provide superior results.

Journal ArticleDOI
TL;DR: The singular-value decomposition (SVD) method is used to derive a refined low-dimensional feature space from the high-dimensional raw feature space, where similar video patterns are placed together and can be easily clustered.
Abstract: We deal with video shot-cut detection in digital videos using the singular-value decomposition (SVD). SVD is performed on a matrix whose columns are the 3D frame color histograms. We have used SVD for its capabilities to derive a refined low-dimensional feature space from the high-dimensional raw feature space, where similar video patterns are placed together and can be easily clustered. After SVD is performed, a two-phase process is employed to detect the shots. In the first phase, a dynamic clustering method is used to create the frame clusters. In the second phase, every two consecutive clusters, obtained by the clustering procedure, are tested for a possible merging in order to reduce false shot-cut detections. In the merging phase, statistical hypothesis testing is used. The detection technique was applied to several TRECVID video test sets that exhibit different types of shots and contain significant object and camera motion inside the shots. We demonstrate that the method detects cuts and gradual transitions, such as dissolves and fades, with high accuracy.

Proceedings ArticleDOI
03 Sep 2007
TL;DR: Experiments show very promising results for recognizing the pointing gestures by using a single camera using a GVF-snake to find the silhouette of the user.
Abstract: In this paper, a method for recognizing pointing gestures without markers is proposed. The video-based system uses one camera, which observes the user in front of a screen and identifies the points pointed by him on this screen, his arm being in the fully extended position towards the screen. A GVF-snake was used in order to find the silhouette of the user. From the silhouette features like position where the person is standing, the position of the fingertip, and the position of the shoulder are extracted, tracked and used to construct a feature vector for each video frame. This vector is fed to properly trained multi-class support vector machines (SVM) in order to obtain the 2D position of the target point on the screen. Two different camera setups with different feature vector configurations are proposed and tested. Experiments show very promising results for recognizing the pointing gestures by using a single camera.

Proceedings Article
01 Sep 2007
TL;DR: This paper introduces a new technique for watermarking city maps by altering a basic parameter of a road segment, namely its width to length ratio, which is embedded in the map using an appropriate quantization of this ratio.
Abstract: Geographic Information System (GIS) data is a valuable asset that should be protected using digital rights management (DRM) techniques. This paper introduces a new technique for watermarking city maps by altering a basic parameter of a road segment, namely its width to length ratio. A watermark is embedded in the map using an appropriate quantization of this ratio. The watermarked map retains its visual quality since the quadrilateral shape of the buildings and their alignment with road boundaries are maintained. The proposed approach performs blind detection through correlation of the detected watermark with the watermark under investigation and is robust against several attacks such as rotation, translation, uniform scaling and additive white Gaussian noise (AWGN).

Journal ArticleDOI
TL;DR: The proposed blind watermarking method is proven to be resistant to 3-D lowpass filtering, noise addition, scaling, translation, cropping and rotation, and to decrease the detection time.
Abstract: In this paper, a robust blind watermarking method for 3-D volumes is presented. A bivalued watermark is embedded in the Fourier transform magnitude of the 3-D volume. The Fourier domain has been selected because of its scaling and rotation invariance. Furthermore, in order to decrease the detection time, a special symmetry of the watermark is exploited. The proposed method is proven to be resistant to 3-D lowpass filtering, noise addition, scaling, translation, cropping and rotation. Experimental results prove the robustness of this method against the above-mentioned attacks.

Proceedings ArticleDOI
01 Feb 2007
TL;DR: A novel approach for estimating 3D head pose in single-view video sequences by using a feature vector which is a by-product of the equations that govern the deformation of the surface model used in the tracking.
Abstract: This paper presents a novel approach for estimating 3D head pose in single-view video sequences. Following initialization by a face detector, a tracking technique that utilizes a 3D deformable surface model to approximate the image intensity is used to track the face in the video sequence. Head pose estimation is performed by using a feature vector which is a by-product of the equations that govern the deformation of the surface model used in the tracking. The afore-mentioned vector is used for training support vector machines (SVM) in order to estimate the 3D head pose. The proposed method was applied to IDIAP head pose estimation database. The obtained results show that the proposed method can achieve an accuracy of 82% if angles are estimated in 10deg increments and 75% if angle are estimated in 5deg increments.

Journal ArticleDOI
TL;DR: Experimental results show that DMT, which includes an embedded compression ratio selection mechanism, has excellent energy compaction properties and achieves comparable compression results to D CT at low compression ratios, while being in general better than DCT at high compression ratios.
Abstract: This paper introduces the discrete modal transform (DMT), a 1D and 2D discrete, non-separable transform for signal processing, which, in the mathematical sense, is a generalization of the well-known discrete cosine transform (DCT). A 3D deformable surface model is used to represent the image intensity and the introduced discrete transform is a by-product of the explicit surface deformation governing equations. The properties of the proposed transform are similar to those of the DCT. To illustrate these properties, the proposed transform is applied to lossy image compression and the obtained results are compared to those of a DCT-based compression scheme. Experimental results show that DMT, which includes an embedded compression ratio selection mechanism, has excellent energy compaction properties and achieves comparable compression results to DCT at low compression ratios, while being in general better than DCT at high compression ratios.


Proceedings ArticleDOI
13 Jul 2007
TL;DR: This work presents a novel method for analyzing FISH images based on the statistical properties of Radial Basis Functions, and evaluated on a data set of 100 breast carcinoma cases provided by the Aristotle University of Thessaloniki and the University of Pisa, with promising results.
Abstract: Fluorescent in situ hybridization (FISH) is a valuable method for determining Her-2/neu status in breast carcinoma samples, an important prognostic indicator. Visual evaluation of FISH images is a difficult task which involves manual counting of dots in multiple images, a procedure which is both time consuming and prone to human error. A number of algorithms have recently been developed dealing with (semi)-automated analysis of FISH images. These algorithms are quite promising but further improvement is required in improving their accuracy. Here, we present a novel method for analyzing FISH images based on the statistical properties of Radial Basis Functions. Our method was evaluated on a data set of 100 breast carcinoma cases provided by the Aristotle University of Thessaloniki and the University of Pisa, with promising results.

Journal ArticleDOI
TL;DR: A complete functional system capable of detecting people and tracking their motion in either live camera feed or pre-recorded video sequences, and can track any object of interest, so long as there are enough features to track.
Abstract: This paper presents a complete functional system capable of detecting people and tracking their motion in either live camera feed or pre-recorded video sequences. The system consists of two main modules, namely the detection and tracking modules. Automatic detection aims at locating human faces and is based on fusion of color and feature-based information. Thus, it is capable of handling faces in different orientations and poses (frontal, profile, intermediate). To avoid false detections, a number of decision criteria are employed. Tracking is performed using a variant of the well-known Kanade-Lucas-Tomasi tracker, while occlusion is handled through a re-detection stage. Manual intervention is allowed to assist both modules if required. In manual mode, the system can track any object of interest, so long as there are enough features to track. The system caters for calibrated cameras and can provide 3-D coordinates of any tracked object(s) of interest. It has been tested with very good results on a variety of video sequences, including a database of studio video sequences, for which 3-D ground truth data, originating from a 4-camera infrared tracking system, exist.

Proceedings ArticleDOI
01 Aug 2007
TL;DR: A novel method for the recognition of facial expressions in videos is proposed that extracts the deformed Candide facial grid that corresponds to the facial expression depicted in the video sequence and calculates a new metric multidimensional scaling.
Abstract: In this paper, a novel method for the recognition of facial expressions in videos is proposed. The system first extracts the deformed Candide facial grid that corresponds to the facial expression depicted in the video sequence. The mean Euclidean distance of the deformed grids is then calculated to create a new metric multidimensional scaling. The classification of the sample under examination to one of the 7 possible classes of facial expressions, i.e., anger, disgust, fear, happiness, sadness, surprise and neutral, is performed using multiclass SVMs defined in the new space. The experiments were performed using the Cohn-Kanade database and the results show that the above mentioned system can achieve an accuracy of 95.6%.

Proceedings ArticleDOI
12 Nov 2007
TL;DR: A novel algorithm that can handle the verification problem more efficiently than traditional LDA is presented and various statistical observations are made about the discriminant coefficients that are generated.
Abstract: When linear discriminant analysis (LDA) is employed, the correct classification of a sample heavily depends on having an adequately large training set. This is often not possible in practical applications, such as person verification, where the lack of sufficient training samples causes improper estimation of a linear separation hyper-plane between the two classes. To overcome this shortcoming a novel algorithm that can handle the verification problem more efficiently than traditional LDA is presented. The dimensionality of the samples is reduced by breaking them down, thus creating subsets of smaller dimensionality feature vectors, and applying discriminant analysis on each subset. The resulting discriminant weight sets are themselves weighted under a normalization criterion, making the discriminant functions continuous in this sense. A series of simulations that formulate the face verification problem illustrate the cases for which our method outperforms traditional LDA and various statistical observations are made about the discriminant coefficients that are generated.

Proceedings ArticleDOI
13 Jul 2007
TL;DR: In this article, a method for template simplification is presented, where the template is used to find interesting objects within an image, improving computational performance since less template points are matched using a simplified template.
Abstract: In this paper, we present a novel method for template simplification, where the template is used to find interesting objects within an image. In this way, we improve computational performance since less template points are matched using a simplified template. Moreover, we increase the reliability of the matching as we keep template points with focusing on the main shape behavior (skeleton) of the template. The theoretical background of the simplification is derived through the centroidal Voronoi tessellation framework. The efficiency of the proposed approach is demonstrated with detecting human appearance in thermal images.