scispace - formally typeset
Search or ask a question

Showing papers by "Ioannis Pitas published in 2006"


Proceedings ArticleDOI
03 Apr 2006
TL;DR: The difficulties involved in the construction of such a multimodal emotion database are presented and the different protocols that have been used to cope with these difficulties are described.
Abstract: This paper presents an audio-visual emotion database that can be used as a reference database for testing and evaluating video, audio or joint audio-visual emotion recognition algorithms. Additional uses may include the evaluation of algorithms performing other multimodal signal processing tasks, such as multimodal person identification or audio-visual speech recognition. This paper presents the difficulties involved in the construction of such a multimodal emotion database and the different protocols that have been used to cope with these difficulties. It describes the experimental setup used for the experiments and includes a section related to the segmentation and selection of the video samples, in such a way that the database contains only video sequences carrying the desired affective information. This database is made publicly available for scientific research purposes.

458 citations


Journal ArticleDOI
TL;DR: Two supervised methods for enhancing the classification accuracy of the Nonnegative Matrix Factorization (NMF) algorithm are presented and greatly enhance the performance of NMF for frontal face verification.
Abstract: In this paper, two supervised methods for enhancing the classification accuracy of the Nonnegative Matrix Factorization (NMF) algorithm are presented. The idea is to extend the NMF algorithm in order to extract features that enforce not only the spatial locality, but also the separability between classes in a discriminant manner. The first method employs discriminant analysis in the features derived from NMF. In this way, a two-phase discriminant feature extraction procedure is implemented, namely NMF plus Linear Discriminant Analysis (LDA). The second method incorporates the discriminant constraints inside the NMF decomposition. Thus, a decomposition of a face to its discriminant parts is obtained and new update rules for both the weights and the basis images are derived. The introduced methods have been applied to the problem of frontal face verification using the well-known XM2VTS database. Both methods greatly enhance the performance of NMF for frontal face verification

330 citations


Journal ArticleDOI
TL;DR: It is demonstrated that the method detects both fades and abrupt cuts with high accuracy and it is shown that it captures satisfactorily the visual content of the shot.
Abstract: New methods for detecting shot boundaries in video sequences and for extracting key frames using metrics based on information theory are proposed. The method for shot boundary detection relies on the mutual information (MI) and the joint entropy (JE) between the frames. It can detect cuts, fade-ins and fade-outs. The detection technique was tested on the TRECVID2003 video test set having different types of shots and containing significant object and camera motion inside the shots. It is demonstrated that the method detects both fades and abrupt cuts with high accuracy. The information theory measure provides us with better results because it exploits the inter-frame information in a more compact way than frame subtraction. It was also successfully compared to other methods published in literature. The method for key frame extraction uses MI as well. We show that it captures satisfactorily the visual content of the shot.

311 citations


Journal ArticleDOI
TL;DR: This paper focuses not on the high-level video analysis task themselves but on the common basic techniques that have been developed to facilitate them, including shot boundary detection and condensed video representation.
Abstract: There is an urgent need to develop techniques that organize video data into more compact forms or extract semantically meaningful information. Such operations can serve as a first step for a number of different data access tasks such as browsing, retrieval, genre classification, and event detection. In this paper, we focus not on the high-level video analysis task themselves but on the common basic techniques that have been developed to facilitate them. These basic tasks are shot boundary detection and condensed video representation

282 citations


Journal ArticleDOI
TL;DR: An integrated methodology for the detection and removal of cracks on digitized paintings is presented and has been shown to perform very well ondigitized paintings suffering from cracks.
Abstract: An integrated methodology for the detection and removal of cracks on digitized paintings is presented in this paper. The cracks are detected by thresholding the output of the morphological top-hat transform. Afterward, the thin dark brush strokes which have been misidentified as cracks are removed using either a median radial basis function neural network on hue and saturation data or a semi-automatic procedure based on region growing. Finally, crack filling using order statistics filters or controlled anisotropic diffusion is performed. The methodology has been shown to perform very well on digitized paintings suffering from cracks.

134 citations


Journal ArticleDOI
TL;DR: The average prediction error of SVMs is decompose into the bias and the variance terms, and the aggregation effect is defined, and it is demonstrated that support vector machines are stable classifiers.

34 citations


Journal ArticleDOI
TL;DR: The system was evaluated with receiver operating characteristic (ROC) analysis on a large database of 919 original images consisting of randomly drawn art images and similar images from specific categories, along with 30 transformed images for each original, totaling 27570 images.
Abstract: Typically, content-based image retrieval (CBIR) systems receive an image or an image description as input and retrieve images from a database that are similar to the query image in regard to properties such as color, texture, shape, or layout. A kind of system that did not receive much attention compared to CBIR systems, is one that searches for images that are not similar but exact copies of the same image that have undergone some transformation. In this paper, we present such a system referred to as an image fingerprinting system, since it aims to extract unique and robust image descriptors (in analogy to human fingerprints). We examine the use of color-based descriptors and provide comparisons for different quantization methods, histograms calculated using color-only and/or spatial-color information with different similarity measures. The system was evaluated with receiver operating characteristic (ROC) analysis on a large database of 919 original images consisting of randomly drawn art images and similar images from specific categories, along with 30 transformed images for each original, totaling 27570 images. The transformed images were produced with attacks that typically occur during digital image distribution, including different degrees of scaling, rotation, cropping, smoothing, additive noise and compression, as well as illumination contrast changes. Results showed a sensitivity of 96% at the small false positive fraction of 4% and a reduced sensitivity of 88% when 13% of all transformations involved changing the illuminance of the images. The overall performance of the system is encouraging for the use of color, and particularly spatial chromatic descriptors for image fingerprinting

33 citations


Proceedings Article
01 Sep 2006
TL;DR: A novel method for eye detection and eye center localization, based on geometrical information is described, which can work on low-resolution images and has been tested on two face databases with very good results.
Abstract: A novel method for eye detection and eye center localization, based on geometrical information is described in this paper. First, a face detector is applied to detect the facial region, and the edge map of this region is extracted. A vector pointing to the closest edge pixel is then assigned to every pixel. Length and slope information for these vectors is used to detect the eyes. For eye center localization, intensity information is used. The proposed method can work on low-resolution images and has been tested on two face databases with very good results.

32 citations


Journal ArticleDOI
TL;DR: Evaluating the utility of digital image processing and analysis procedures for the study and comparison of the efficiency of 2 root canal instrumentation techniques provided the ability to visualize dentin lost during root canalstrumentation, compare root canal morphology before and after instrumentation, and quantitatively evaluate the enlargement of the root canal area induced by each of the instrumentations.
Abstract: Objective The objective of this study was to evaluate the utility of digital image processing and analysis procedures for the study and comparison of the efficiency of 2 root canal instrumentation techniques. Study Design Forty mandibular incisors with a single canal were randomly divided into 2 groups of 20 teeth. A step-back technique was followed for the instrumentation of the root canals of Group 1 teeth using hand stainless steel Hedstrom files (Dentsply Maillefer, Switzerland), while a crown-down technique using ProFile engine-driven nickel-titanium instruments (Dentsply Maillefer) was followed for the instrumentation of the Group 2 root canals. Radiographs of each tooth were taken in bucco-lingual and mesio-distal projections, both before and after instrumentation, under constant conditions and by using a direct digital intraoral radiography system. The postoperative radiographs were digitally subtracted from their respective preoperative radiographs. A contrast enhancement process was applied to the resultant digital subtractive images. The enlargement of the root canals created by each instrumentation technique regarding the apical 6 mm was assessed through the application of region segmentation and area measurement processes. Results Using this methodology no significant difference between the 2 preparation techniques was found in terms of configuration and enlargement of the root canals. Conclusions The application of this methodology provided the ability to (1) visualize dentin lost during root canal instrumentation, (2) simultaneously compare root canal morphology before and after instrumentation, and (3) quantitatively evaluate the enlargement of the root canal area induced by each of the instrumentation techniques.

23 citations


Journal ArticleDOI
TL;DR: An analysis of these three representations in connection to the receptive field parameters such as spatial frequency, frequency orientation, position, length, width, aspect ratio, etc can have an insight of how suitable these algorithms are to resemble biological visual perception systems.

22 citations


Proceedings ArticleDOI
09 Jul 2006
TL;DR: A virtual teeth drilling system named virtual dental patient designed to aid dentists in getting acquainted with the teeth anatomy, the handling of drilling instruments and the challenges associated with the drilling procedure is introduced.
Abstract: This paper introduces, a virtual teeth drilling system named Virtual Dental Patient designed to aid dentists in getting acquainted with the teeth anatomy, the handling of drilling instruments and the challenges associated with the drilling procedure. The basic aim of the system is to be used for the training of dental students. The application features a 3D model of the face and the oral cavity that can be adapted to the characteristics of a specific person and animated. Drilling using a haptic device is performed on realistic teeth models (constructed from real data), within the oral cavity. Results and intermediate steps of the drilling procedure can be saved for future use.

Proceedings ArticleDOI
09 Jul 2006
TL;DR: A new approach for face clustering is developed, where Mutual information and joint entropy are exploited in order to create a metric for the clustering process, which guarantees some robustness against standard noisy transformation such as scaling, cropping and pose changes.
Abstract: In this paper a new approach for face clustering is developed. Mutual information and joint entropy are exploited in order to create a metric for the clustering process. The way the joint entropy and the mutual information are calculated gives some interesting properties to the aforementioned metric, which guarantees some robustness against standard noisy transformation such as scaling, cropping and pose changes. A slight preprocessing of the input face images is done in order to undertake problems that arise from detector's known errors.

Proceedings ArticleDOI
04 Sep 2006
TL;DR: It is argued that the large deviation and increased values of the number of pixels with low intensities that the mouth region of a speaking person demonstrates can be used as visual cues for detecting speech.
Abstract: In recent research efforts, the integration of visual cues into speech analysis systems has been proposed with favorable response. This paper introduces a novel approach for lip activity and visual speech detection. We argue that the large deviation and increased values of the number of pixels with low intensities that the mouth region of a speaking person demonstrates can be used as visual cues for detecting speech. We describe a statistical algorithm, based on detection theory, for the efficient characterization of speaking and silent intervals in video sequences. The proposed system has been tested into a number of video sequences with encouraging experimental results. Potential applications include speech intent detection, speaker determination and semantic video annotation.

Proceedings ArticleDOI
14 May 2006
TL;DR: A novel temporal video segmentation method that, in addition to abrupt cuts, can detect with very high accuracy gradual transitions such as dissolves, fades and wipes is proposed.
Abstract: A novel temporal video segmentation method that, in addition to abrupt cuts, can detect with very high accuracy gradual transitions such as dissolves, fades and wipes is proposed. The method relies on evaluating mutual information between multiple pairs of frames within a certain temporal frame window. This way we create a graph where the frames are nodes and the measures of similarity correspond to the weights of the edges. By finding and disconnecting the weak connections between nodes we separate the graph to subgraphs ideally corresponding to the shots. Experiments on TRECVID2004 video test set containing different types of shot transitions and significant object and camera motion inside the shots prove that the method is very efficient.

Proceedings ArticleDOI
01 Dec 2006
TL;DR: Two fingerprinting approaches are reviewed in this paper: an image fingerprinting technique that makes use of color-based descriptors, R-trees and linear discriminant analysis (LDA) and a video fingerprinting method that utilizes information about the appearances of actors in videos along with an efficient search strategy.
Abstract: Multimedia fingerprinting, also know as robust/perceptual hashing and replica detection is an emerging technology that can be used as an alternative to watermarking for the efficient Digital Rights Management (DRM) of multimedia data. Two fingerprinting approaches are reviewed in this paper. The first is an image fingerprinting technique that makes use of color-based descriptors, R-trees and Linear Discriminant Analysis (LDA). The second is a video fingerprinting method that utilizes information about the appearances of actors in videos along with an efficient search strategy. Experimental performance analysis is provided for both methods.

Book ChapterDOI
01 Jan 2006
TL;DR: The results show that the method is very efficient in locating the face regions and in the recognition of portrait paintings and the performance of the algorithm is encouraging for its further development, which includes the extraction of portrait-specific semantic information.
Abstract: This paper presents a method for automatic annotation of portraits in art image databases and discusses the extraction of semantic information from portraits. The proposed method segments images into candidate regions and fits an ellipse and a bounding box to them. Their extracted features serve as input to a neural network, which is trained to distinguish between face and non-face regions. Paintings containing face regions are classified as portraits. The method evaluation is done on a set of 188 digital paintings using ROC curves as performance measures. The results show that the method is very efficient in locating the face regions and in the recognition of portrait paintings. The performance of the algorithm is encouraging for its further development, which includes the extraction of portrait-specific semantic information.

Proceedings ArticleDOI
01 Oct 2006
TL;DR: The discriminant non-negative matrix factorization (DNMF) algorithm is applied at the image of the last frame of the video sequence, corresponding to the greatest intensity of the facial expression, thus extracting the texture information.
Abstract: A novel method based on geometrical and texture information is proposed for facial expression recognition from video sequences. The discriminant non-negative matrix factorization (DNMF) algorithm is applied at the image of the last frame of the video sequence, corresponding to the greatest intensity of the facial expression, thus extracting the texture information. A support vector machines (SVMs) system is used for the classification of the geometrical information derived from tracking the Candide grid over the video sequence. The geometrical information consists of the differences of the node coordinates between the neutral (first) and the fully expressed facial expression (last) video frame. The fusion of texture and geometrical information obtained is performed using SVMs. The accuracy achieved is 98,7% when recognizing the six basic facial expressions.

Book ChapterDOI
21 Aug 2006
TL;DR: This paper proposes a complete framework for accurate face localization on video frames by combining detection and forward tracking according to predefined rules and using a dynamic programming algorithm to select the candidates that minimize a specific cost function.
Abstract: This paper proposes a complete framework for accurate face localization on video frames. Detection and forward tracking are first combined according to predefined rules to get a first set of face candidates. Backward tracking is then applied to provide another set of pos-sible localizations. Finally a dynamic programming algorithm is used to select the candidates that minimize a specific cost function. This method was designed to handle different scale, pose and lighting conditions. The experiments show that it improves the face detection rate compared to a frame-based detector and provides a higher precision than a forward information-based tracker.

Proceedings Article
01 Sep 2006
TL;DR: The method has been tested on a variety of sequences with very good results, including a database of video sequences representing human faces changing from the neutral state to the one that represents a fully formed human facial expression.
Abstract: This paper presents a method for generalizing human facial expressions or personalizing (cloning) them from one person to completely different persons, by means of a statistical analysis of human facial expressions coming from various persons. The data used for the statistical analysis are obtained by tracking a generic facial wireframe model in video sequences depicting the formation of the different human facial expressions, starting from a neutral state. Wireframe node tracking is performed by a pyramidal variant of the well-known Kanade-Lucas-Tomasi (KLT) tracker. The loss of tracked features is handled through a model deformation procedure increasing the robustness of the tracking algorithm. The dynamic facial expression output model is MPEG-4 compliant. The method has been tested on a variety of sequences with very good results, including a database of video sequences representing human faces changing from the neutral state to the one that represents a fully formed human facial expression.

Proceedings ArticleDOI
14 May 2006
TL;DR: This work has developed a method that uses the pre-extracted output of face detection and recognition to perform fast semantic indexing and retrieval of video segments.
Abstract: The extraction of a digital signature from a video segment in order to uniquely identify it, is often a necessary prerequisite for video indexing, copyright protection and other tasks. Semantic video signatures are those that are based on high-level content information rather than on low-level features of the video stream, their major advantage being that they are invariant to nearly all types of distortion. Since a major semantic feature of a video is the appearance of specific people in specific frames, we have developed a method that uses the pre-extracted output of face detection and recognition to perform fast semantic indexing and retrieval of video segments. We give the results of the experimental evaluation of our method on an artificial database created using a probabilistic model of the creation of video.

Proceedings ArticleDOI
09 Jul 2006
TL;DR: A novel method on how to take advantage of the snake representation of target objects, when doing chamfer matching for detection/recognition purposes, and the possibility of involving fewer pixels from both the target and template object to speed up computations is investigated.
Abstract: In this paper we present a novel method on how to take advantage of the snake representation of target objects, when doing chamfer matching for detection/recognition purposes. In this case several time-consuming steps of classic chamfer matching approaches can be simplified. Moreover, we investigate the possibility of involving fewer pixels from both the target and template object to speed up computations. We introduce an optimization method for such an object reduction, which is valid also in the general application scheme of chamfer matching. Finally, we present our experimental results regarding human body detection

Book ChapterDOI
11 Sep 2006
TL;DR: A novel framework for dialogue detection that is based on indicator functions based on the cross-power in a particular frequency band that is also compared to a threshold is investigated.
Abstract: In this paper, we investigate a novel framework for dialogue detection that is based on indicator functions. An indicator function defines that a particular actor is present at each time instant. Two dialogue detection rules are developed and assessed. The first rule relies on the value of the cross-correlation function at zero time lag that is compared to a threshold. The second rule is based on the cross-power in a particular frequency band that is also compared to a threshold. Experiments are carried out in order to validate the feasibility of the aforementioned dialogue detection rules by using ground-truth indicator functions determined by human observers from six different movies. A total of 25 dialogue scenes and another 8 non-dialogue scenes are employed. The probabilities of false alarm and detection are estimated by cross-validation, where 70% of the available scenes are used to learn the thresholds employed in the dialogue detection rules and the remaining 30% of the scenes are used for testing. An almost perfect dialogue detection is reported for every distinct threshold.

Proceedings ArticleDOI
21 May 2006
TL;DR: A way to significantly improve the speed of the algorithm convergence by constructing initial basis images that meet the sparseness and orthogonality requirements and approximate the final minimization solution is provided.
Abstract: A subspace supervised learning algorithm named discriminant non-negative matrix factorization (DNMF) has been recently proposed for classifying human facial expressions. It decomposes images into a set of basis images and corresponding coefficients. Usually, the algorithm starts with random basis image and coefficient initialization. Then, at each iteration, both basis images and coefficients are updated to minimize the underlying cost function. The algorithm may need several thousands of iterations to obtain cost function minimization. We provide a way to significantly improve the speed of the algorithm convergence by constructing initial basis images that meet the sparseness and orthogonality requirements and approximate the final minimization solution. To experimentally evaluate the new approach, we have applied DNMF using the random and the proposed initialization procedure to recognize six basic facial expressions. While fewer iteration steps are needed with the proposed initialization, the recognition accuracy remains within satisfactory levels.

Proceedings ArticleDOI
09 Jul 2006
TL;DR: A novel system for image replica detection that uses color-based descriptors in order to extract robust features for image representation is presented and is enhanced with linear discriminant analysis (LDA).
Abstract: In this paper a novel system for image replica detection is presented. The system uses color-based descriptors in order to extract robust features for image representation. These features are used for indexing the images in a database using an R-Tree. When a query about whether a test image is a replica of an image in the database is submitted, the R-Tree is traversed and a set of candidate images is retrieved. Then, in order to obtain a single result and at the same time reduce the number of decision errors the system is enhanced with Linear Discriminant Analysis (LDA). The conducted experiments show that the proposed approach is very promising.

Book ChapterDOI
21 Aug 2006
TL;DR: The experimental analysis reported herein provides evidence for the usefulness of the proposed approach and motivates the further development of linguistics-related tools as a means of analysing biological sequences.
Abstract: Within this paper we are proposing and testing a new strategy for detection and measurement of similarity between sequences of proteins. Our approach has its roots in computational linguistics and the related techniques for quantifying and comparing content in strings of characters. The pairwise comparison of proteins relies on the content regularities expected to uniquely characterize each sequence. These regularities are captured by n-gram based modelling techniques and exploited by cross-entropy related measures. In this new attempt to incorporate theoretical ideas from computational linguistics into the field of bioinformatics, we experimented using two implementations having always as ultimate goal the development of practical, computationally efficient algorithms for expressing protein similarity. The experimental analysis reported herein provides evidence for the usefulness of the proposed approach and motivates the further development of linguistics-related tools as a means of analysing biological sequences.


01 Jan 2006
TL;DR: The paper describes the approach as well as the performance analysis of the Artificial Intelligence and Information Analysis laboratory approach for shot boundary detection as applied to the TRECVID 2006 video retrieval benchmark.
Abstract: In this paper, we describe the Artificial Intelligence and Information Analysis (AIIA) laboratory approach for shot boundary detection as applied to the TRECVID 2006 video retrieval benchmark. The paper describes the approach as well as the performance analysis. The method relies on evaluating mutual information between multiple pairs of frames within a certain temporal window. The performance of the method on the benchmark data was in general very satisfactory.

Book ChapterDOI
18 May 2006
TL;DR: This paper examines the use of color descriptors based on a 24-color quantized palette for image fingerprinting and Comparisons are provided between different similarity measures methods as well as regarding theuse of color-only and spatial chromatic histograms.
Abstract: Image fingerprinting systems aim to extract unique and robust image descriptors (in analogy to human fingerprints). They search for images that are not only perceptually similar but replicas of an image generated through mild image processing operations. In this paper, we examine the use of color descriptors based on a 24-color quantized palette for image fingerprinting. Comparisons are provided between different similarity measures methods as well as regarding the use of color-only and spatial chromatic histograms.

01 Oct 2006
TL;DR: A system that makes use of the fusion information paradigm to integrate two different sorts of information in order to improve the facial expression classification accuracy over a single feature based classification one is presented.
Abstract: The paper presents a system that makes use of the fusion information paradigm to integrate two different sorts of information in order to improve the facial expression classification accuracy over a single feature based classification one. The Discriminant Non-negative Matrix Factorization (DNMF) approach is used to extract a first set of features and an automatically geometrical-based feature extraction algorithm is used for retrieving the second set of features. These features are then concatenated into a single feature vector at feature level. Experiments showed that, when these mixed features are used for classification, the classification accuracy is improved compared with the case when only one type of these features is used.

Book ChapterDOI
21 Aug 2006
TL;DR: The Discriminant Non-negative Matrix Factorization (DNMF) algorithm is applied at the image cor-responding to the greatest intensity of the facial expression (last frame of the video sequence), extracting that way the texture information.
Abstract: A novel method based on shape and texture information is proposed in this paper for facial expression recognition from video sequences. The Discriminant Non-negative Matrix Factorization (DNMF) algorithm is applied at the image cor-responding to the greatest intensity of the facial expression (last frame of the video sequence), extracting that way the texture information. A Support Vector Machines (SVMs) system is used for the classification of the shape information derived from tracking the Candide grid over the video sequence. The shape information consists of the differences of the node coordinates between the first (neutral) and last (fully expressed facial expression) video frame. Subsequently, fusion of texture and shape information obtained is performed using Radial Basis Function (RBF) Neural Net-works (NNs). The accuracy achieved is equal to 98,2% when recognizing the six basic facial expressions.