scispace - formally typeset
Search or ask a question

Showing papers by "Ioannis Pitas published in 2004"


Proceedings ArticleDOI
17 May 2004
TL;DR: The major contribution of the paper is to rate the discriminating capability of a set of features for emotional speech recognition, a useful tool which can be used in psychology to automatically classify utterances into five emotional states.
Abstract: Our purpose is to design a useful tool which can be used in psychology to automatically classify utterances into five emotional states such as anger, happiness, neutral, sadness, and surprise. The major contribution of the paper is to rate the discriminating capability of a set of features for emotional speech recognition. A total of 87 features has been calculated over 500 utterances from the Danish Emotional Speech database. The sequential forward selection method (SFS) has been used in order to discover a set of 5 to 10 features which are able to classify the utterances in the best way. The criterion used in SFS is the cross-validated correct classification score of one of the following classifiers: nearest mean and Bayes classifier where class pdf are approximated via Parzen windows or modelled as Gaussians. After selecting the 5 best features, we reduce the dimensionality to two by applying principal component analysis. The result is a 51.6% /spl plusmn/ 3% correct classification rate at 95% confidence interval for the five aforementioned emotions, whereas a random classification would give a correct classification rate of 20%. Furthermore, we find out those two-class emotion recognition problems whose error rates contribute heavily to the average error and we indicate that a possible reduction of the error rates reported in this paper would be achieved by employing two-class classifiers and combining them.

225 citations


Proceedings ArticleDOI
23 Aug 2004
TL;DR: It is found that, for the first database, LNMF outperforms both PCA and NMF, while NMF produces the poorest recognition performance, with slightly performance improvement on behalf of NMF.
Abstract: Two image representation approaches called non-negative matrix factorization (NMF) and local non-negative matrix factorization (LNMF) have been applied to two facial databases for recognizing six basic facial expressions. A principal component analysis (PCA) approach was performed as well for facial expression recognition for comparison purposes. We found that, for the first database, LNMF outperforms both PCA and NMF, while NMF produces the poorest recognition performance. Results are approximately the same for the second database, with slightly performance improvement on behalf of NMF.

111 citations


Journal ArticleDOI
TL;DR: This work presents a method for embedding and detecting watermarks in vector graphics images containing polygonal lines, and presents results from simulated attacks.
Abstract: Digital products are easy to copy, reproduce, and maliciously process in a network environment Over the past decade, watermarking has emerged as an important technology for protecting copyright of multimedia products We present a method for embedding and detecting watermarks in vector graphics images containing polygonal lines, and present results from simulated attacks Our proposed algorithm embeds watermarks in polygonal lines describing image contours These contours are usually described in vector format Our watermarking method slightly modifies the vertex coordinates of the polygonal line We embed the watermark in the magnitude of the curve's Fourier descriptors to exploit its location, scale, and rotation invariant properties Our technique has certain similarities to bitmap image watermarking in the discrete Fourier transform (DFT) domain However, it's essentially different because it can be applied to vector rather than bitmap images Furthermore, Fourier descriptor watermarking problems differ from DFT bitmap image watermarking problems

102 citations


Proceedings ArticleDOI
29 Sep 2004
TL;DR: This paper found that the newly proposed algorithm discriminant non-negative matrix factorization (DNMF) shows superior performance by achieving a higher recognition rate, when compared to NMF and LNMF.
Abstract: In this paper, we present a novel algorithm for learning facial expressions in a supervised manner. This algorithm is derived from the local non-negative matrix factorization (LNMF) algorithm, which is an extension of non-negative matrix factorization (NMF) method. We call this newly proposed algorithm discriminant non-negative matrix factorization (DNMF). Given an image database, all these three algorithms decompose the database into basis images and their corresponding coefficients. This decomposition is computed differently for each method. The decomposition results are applied on facial images for the recognition of the six basic facial expressions. We found that our algorithm shows superior performance by achieving a higher recognition rate, when compared to NMF and LNMF

89 citations


Journal ArticleDOI
TL;DR: The proposed system is built on hidden Markov models that enable the modeling of character sequences that provide the means for language tracking, that is, language identification across the segments of a multilingual document.

45 citations


Journal ArticleDOI
TL;DR: A joint probabilistic face detection and tracking algorithm, combining likelihood estimation and a prior probability, is proposed, which has been tested on real image sequences and is robust to significant partial occlusion and illumination changes.
Abstract: A joint probabilistic face detection and tracking algorithm, combining likelihood estimation and a prior probability, is proposed. The likelihood estimation scheme is based on the statistical training of sets of automatically generated feature points and a mutual information tracking cue, while the prior probability estimation is based on a Gaussian temporal model. The likelihood estimation process is the core of a multiple face detection scheme used to initialize the tracking process. The resulting system has been tested on real image sequences and is robust to significant partial occlusion and illumination changes.

36 citations


Proceedings ArticleDOI
04 Oct 2004
TL;DR: A combined scheme of linear prediction analysis for feature extraction along with linear projection methods for feature reduction followed by known pattern recognition methods on the purpose of discriminating between normal and pathological voice samples is proposed.
Abstract: In this paper we propose a combined scheme of linear prediction analysis for feature extraction along with linear projection methods for feature reduction followed by known pattern recognition methods on the purpose of discriminating between normal and pathological voice samples. Two different cases of speech under vocal fold pathology are examined: vocal fold paralysis and vocal fold edema. Three known classifiers are tested and compared in both cases, namely the Fisher linear discriminant, the -nearest neighbor classifier, and the nearest mean classifier. The performance of each classifier is evaluated in terms of the probabilities of false alarm and detection or the receiver operating characteristic. The datasets used are part of a database of disordered speech developed by Massachusetts Eye and Ear Infirmary. The experimental results indicate that vocal fold paralysis and edema can easily be detected by any of the aforementioned classifiers.

33 citations


Journal ArticleDOI
TL;DR: Evaluated digital radiograph registration and subtraction software for a sensitive and reliable assessment of the progress of chronic apical periodontitis found changes to the periapical tissue structure were easily detectable, even during short time intervals.

31 citations


Journal ArticleDOI
TL;DR: This paper deals with the statistical analysis of the behavior of a blind robust watermarking system based on pseudorandom signals embedded in the magnitude of the Fourier transform of the host data.
Abstract: This paper deals with the statistical analysis of the behavior of a blind robust watermarking system based on pseudorandom signals embedded in the magnitude of the Fourier transform of the host data. The host data that the watermark is embedded into is one-dimensional and non-white, following a specific probability model. The analysis performed involves theoretical evaluation of the statistics of the Fourier coeffcients and the design of an optimal detector for multiplicative watermark embedding. Finally, experimental results are presented in order to show the performance of the proposed detector versus that of the correlator detector.

26 citations


Journal ArticleDOI
TL;DR: The self-organizing map algorithm has been used successfully in document organization and its variant is proposed using the same algorithm for document retrieval and tested by replacing the linear Least Mean Squares adaptation rule with the marginal median.

21 citations


Journal ArticleDOI
TL;DR: The methods present temporal EC methods for predictively coded frames or frames for which motion information pre-exists in the video bitstream surpass that of other state-of-the-art temporal concealment methods that also attempt to estimate unavailable motion information and perform concealment afterwards.
Abstract: A study on the use of vector rational interpolation for the estimation of erroneously received motion fields of MPEG-2 predictively coded frames is undertaken in this paper, aiming further at error concealment (EC). Various rational interpolation schemes have been investigated, some of which are applied to different interpolation directions. One scheme additionally uses the boundary matching error and another one attempts to locate the direction of minimal/maximal change in the local motion field neighborhood. Another one further adopts bilinear interpolation principles, whereas a last one additionally exploits available coding mode information. The methods present temporal EC methods for predictively coded frames or frames for which motion information pre-exists in the video bitstream. Their main advantages are their capability to adapt their behavior with respect to neighboring motion information, by switching from linear to nonlinear behavior, and their real-time implementation capabilities, enabling them for real-time decoding applications. They are easily embedded in the decoder model to achieve concealment along with decoding and avoid post-processing delays. Their performance proves to be satisfactory for packet error rates up to 2% and for video sequences with different content and motion characteristics and surpass that of other state-of-the-art temporal concealment methods that also attempt to estimate unavailable motion information and perform concealment afterwards.

Proceedings ArticleDOI
23 Aug 2004
TL;DR: The efficiency of three tracking reliability metrics based on information theory and normalized correlation is examined and Experimental results have shown that the information theory based metrics perform better than the normalized correlation one.
Abstract: The efficiency of three tracking reliability metrics based on information theory and normalized correlation is examined in this paper. The two information theory tools used for the metrics construction are the mutual information and the Kullback-Leibler distance. The metrics are applicable to any feature-based tracking scheme. In the context of this work, they are applied for comparison purposes on an object tracking scheme using multiple feature point correspondences. Experimental results have shown that the information theory based metrics perform better than the normalized correlation one.


Proceedings ArticleDOI
24 Oct 2004
TL;DR: A novel method for blind 3D mesh model watermarking applications is proposed that is robust against 3D translation, scaling and mesh simplifications.
Abstract: In this paper, a novel method for blind 3D mesh model watermarking applications is proposed. The method is robust against 3D translation, scaling and mesh simplifications. A pseudo-random watermarking signal is casted in the 3D mesh model by geometrically deforming its vertices, without altering the vertex topology. Prior to embedding and detection, a set of simple transforms is applied to the 3D mesh model. Each sample of the watermark sequence is embedded in a set of vertices rather than in a single vertex in order to deal with mesh simplifications. Experimental results indicate the ability of the proposed method to deal with the aforementioned attacks.

Journal ArticleDOI
01 Dec 2004
TL;DR: Metrics measuring tracking reliability under occlusion that are based on mutual information and do not resort to ground truth data are proposed and tested on an object tracking scheme using multiple feature point correspondences.
Abstract: Metrics measuring tracking reliability under occlusion that are based on mutual information and do not resort to ground truth data are proposed. Metrics for both the initialisation of the region to be tracked as well as for measuring the performance of the tracking algorithm are presented. The metrics variations may be interpreted as a quantitative estimate of changes in the tracking region due to occlusion, sudden movement or deformation of the tracked object. Performance metrics based on the Kullback -Leibler distance and normalised correlation were also added for comparison purposes. The proposed approach was tested on an object tracking scheme using multiple feature point correspondences. Experimental results have shown that mutual information can effectively characterise object appearance and reappearance in many computer vision applications.

Proceedings Article
01 Sep 2004
TL;DR: The chaotic watermarking framework is applied successfully to audio signals, demonstrating its superiority with respect to both robustness and inaudibility.
Abstract: In this paper, an overview of watermarking schemes based on chaotic generators and correlation detection is presented. Statistical properties of watermark sequences generated by piecewise-linear Markov maps are exploited for both additive and multiplicative watermark embedding. Correlation/spectral properties of such sequences are easily controllable, a fact that reflects on the watermarking system performance. A family of chaotic maps, namely the skew tent map family, is used in temporal and transform-domain watermarking schemes. The chaotic watermarking framework is applied successfully to audio signals, demonstrating its superiority with respect to both robustness and inaudibility.

Proceedings ArticleDOI
23 May 2004
TL;DR: This paper presents an accurate, very fast approach for the deformations of 2D physically based shape models representing open and closed curves that overcome the main shortcoming of other deformable models, i.e. computation time.
Abstract: This paper presents an accurate, very fast approach for the deformations of 2D physically based shape models representing open and closed curves. The introduced models overcome the main shortcoming of other deformable models, i.e. computation time. The approach relies on the determination of explicit deformation governing equations that involve neither eigenvalue decomposition nor any other computationally intensive numerical operation. The approach was evaluated and compared with another fast and accurate physics-based deformable shape model, both in terms of deformation accuracy and computation time. The conclusion is that the introduced model is completely accurate and is deformed very fast on current personal computers.

Proceedings ArticleDOI
27 Jun 2004
TL;DR: It is shown that speaker gender or stress are not crucial criteria for voiced syllables, whereas distance to silence has to be taken into account, and such syllable databases are exploited in an alignment system based on dynamic time warping for modern Greek speech.
Abstract: We present some results on the optimal design of syllable databases for modern Greek. We show that speaker gender or stress are not crucial criteria for voiced syllables, whereas distance to silence has to be taken into account. Such syllable databases are exploited in an alignment system based on dynamic time warping for modern Greek speech. The overall architecture of the alignment system is also presented.

Proceedings ArticleDOI
27 Jun 2004
TL;DR: The method generates the appropriate data, as specified in the MPEG-4 standard, such as the face definition parameters (FDPs), the face animation parameters ( FAPs) and the facial animation table (FAT) from an animated face model in Maya.
Abstract: This work presents a method for extracting facial information encoded in MPEG-4 format from an animated face model in Maya. The method generates the appropriate data, as specified in the MPEG-4 standard, such as the face definition parameters (FDPs), the face animation parameters (FAPs) and the facial animation table (FAT). The described procedure was implemented as a plug-in for Maya, which requires as inputs only the animated face model and the correspondence between its vertices and the FDPs. The extracted data were tested on a publicly available FAP-player in order to demonstrate the faithful reproduction of the face animation originally created with the high-end 3D graphics modeller. The capabilities of the publicly available FAP player have also been augmented in order to enable loading any proprietary face model and its corresponding FAT.