scispace - formally typeset
Search or ask a question

Showing papers by "Ioannis Pitas published in 2013"


Book
31 Jan 2013
TL;DR: In this paper, the authors present a survey of algorithms and architectures for image and signal processing based on order statistics and homomorphies, including adaptive nonlinear filters and median filters.
Abstract: 1. Introduction.- 2. Statistical preliminaries.- 3. Image formation.- 4. Median filters.- 5. Digital filters based on order statistics.- 6. Morphological image and signal processing.- 7. Homomorphie filters.- 8. Polynomial filters.- 9. Adaptive nonlinear filters.- 10. Generalizations and new trends.- 11. Algorithms and architectures.

974 citations


Journal ArticleDOI
TL;DR: The proposed method can successfully operate in situations that may appear in real application scenarios, since it does not set any assumption concerning the visual scene background and the camera view angle.
Abstract: In this paper, we propose a novel method aiming at view-independent human action recognition. Action description is based on local shape and motion information appearing at spatiotemporal locations of interest in a video. Action representation involves fuzzy vector quantization, while action classification is performed by a feedforward neural network. A novel classification algorithm, called minimum class variance extreme learning machine, is proposed in order to enhance the action classification performance. The proposed method can successfully operate in situations that may appear in real application scenarios, since it does not set any assumption concerning the visual scene background and the camera view angle. Experimental results on five publicly available databases, aiming at different application scenarios, denote the effectiveness of both the adopted action recognition approach and the proposed minimum class variance extreme learning machine algorithm.

104 citations


Journal ArticleDOI
TL;DR: A view-independent action recognition method exploiting a low computational-cost volumetric action representation obtained by exploiting the circular shift invariance property of the magnitudes of the Discrete Fourier Transform coefficients is presented.

67 citations


Journal ArticleDOI
TL;DR: A visual object tracking framework, which employs an appearance-based representation of the target object, based on local steering kernel descriptors and color histogram information, which is proven to be successful in tracking objects under scale and rotation variations and partial occlusion, as well as in tracking rather slowly deformable articulated objects.
Abstract: In this paper, we propose a visual object tracking framework, which employs an appearance-based representation of the target object, based on local steering kernel descriptors and color histogram information. This framework takes as input the region of the target object in the previous video frame and a stored instance of the target object, and tries to localize the object in the current frame by finding the frame region that best resembles the input. As the object view changes over time, the object model is updated, hence incorporating these changes. Color histogram similarity between the detected object and the surrounding background is employed for background subtraction. Experiments are conducted to test the performance of the proposed framework under various conditions. The proposed tracking scheme is proven to be successful in tracking objects under scale and rotation variations and partial occlusion, as well as in tracking rather slowly deformable articulated objects.

65 citations


Journal ArticleDOI
TL;DR: This brief proposes an optimization scheme aiming at the optimal class representation, in terms of Fisher ratio maximization, for LDA-based data projection, and achieves higher classification rates in publicly available data sets.
Abstract: Linear discriminant analysis (LDA) is a widely used technique for supervised feature extraction and dimensionality reduction. LDA determines an optimal discriminant space for linear data projection based on certain assumptions, e.g., on using normal distributions for each class and employing class representation by the mean class vectors. However, there might be other vectors that can represent each class, to increase class discrimination. In this brief, we propose an optimization scheme aiming at the optimal class representation, in terms of Fisher ratio maximization, for LDA-based data projection. Compared with the standard LDA approach, the proposed optimization scheme increases class discrimination in the reduced dimensionality space and achieves higher classification rates in publicly available data sets.

55 citations


Journal ArticleDOI
TL;DR: A novel method that performs dynamic action classification by exploiting the effectiveness of the Extreme Learning Machine (ELM) algorithm for single hidden layer feedforward neural networks training.

53 citations


Journal ArticleDOI
TL;DR: A novel sequence representation, based on its fuzzy distances from optimal representative signal instances, called statemes, is proposed, and a novel modified clustering discriminant analysis algorithm minimizing the adopted criterion leading to the optimal discriminant sequence class representation in a low-dimensional space, respectively.
Abstract: In this paper, we present a novel method aiming at multidimensional sequence classification. We propose a novel sequence representation, based on its fuzzy distances from optimal representative signal instances, called statemes. We also propose a novel modified clustering discriminant analysis algorithm minimizing the adopted criterion with respect to both the data projection matrix and the class representation, leading to the optimal discriminant sequence class representation in a low-dimensional space, respectively. Based on this representation, simple classification algorithms, such as the nearest subclass centroid, provide high classification accuracy. A three step iterative optimization procedure for choosing statemes, optimal discriminant subspace and optimal sequence class representation in the final decision space is proposed. The classification procedure is fast and accurate. The proposed method has been tested on a wide variety of multidimensional sequence classification problems, including handwritten character recognition, time series classification and human activity recognition, providing very satisfactory classification results.

40 citations


Proceedings ArticleDOI
16 Oct 2013
TL;DR: This paper provides a comprehensive survey of multi-view human action recognition approaches following an application-based categorization: methods are categorized based on their ability to operate using a fixed or an arbitrary number of cameras.
Abstract: While single-view human action recognition has attracted considerable research study in the last three decades, multi-view action recognition is, still, a less exploited field. This paper provides a comprehensive survey of multi-view human action recognition approaches. The approaches are reviewed following an application-based categorization: methods are categorized based on their ability to operate using a fixed or an arbitrary number of cameras. Finally, benchmark databases frequently used for evaluation of multi-view approaches are briefly described.

27 citations


Journal ArticleDOI
TL;DR: A novel algorithm for recognizing pornographic images based on the analysis of skin color regions is presented and is shown to exhibit state-of-the-art performance against publicly available integratedpornographic image classifiers.
Abstract: In this study 1 , a novel algorithm for recognizing pornographic images based on the analysis of skin color regionsis presented. The skin color information essentially provides Regions of Interest (ROIs). It is demonstrated that theconvex hull of these ROIs provides semantically useful information for pornographic image detection. Based onthese convex hulls, the authors extract a small set of low-level visual features that are empirically proven to possessdiscriminative power for pornographic image classification. In this study, we consider multi-class pornographicimage classification, where the ”nude” and ”benign” image classes are further split into two specialized sub-classes, namely ”bikini” / ”porn” and ”skin” / ”non-skin”, respectively. The extracted feature vectors are fed to anensemble of random forest classifiers for image classification. Each classifier is trained on a partition of the trainingset and solves a binary classification problem. In this sense, the model allows for seamless coarse-to-fine-grainedclassification by means of a tree-structured topology of a small number of intervening binary classifiers. The overalltechnique is evaluated on the AIIA-PID challenge of 9;000 samples of pornographic and benign images collectedfrom the Web. The technique is shown to exhibit state-of-the-art performance against publicly available integratedpornographic image classifiers.Index Termsconvex hull calculation, multi-class classification, porn detection, random forests, skin ROI localization

26 citations


Journal ArticleDOI
TL;DR: Experimental results in several databases indicate that the incorporation of the maximum margin classification constraints into the NMF and discriminant NMF objective functions improves the accuracy of the classification.
Abstract: The state-of-the-art classification methods which employ nonnegative matrix factorization (NMF) employ two consecutive independent steps. The first one performs data transformation (dimensionality reduction) and the second one classifies the transformed data using classification methods, such as nearest neighbor/centroid or support vector machines (SVMs). In the following, we focus on using NMF factorization followed by SVM classification. Typically, the parameters of these two steps, e.g., the NMF bases/coefficients and the support vectors, are optimized independently, thus leading to suboptimal classification performance. In this paper, we merge these two steps into one by incorporating maximum margin classification constraints into the standard NMF optimization. The notion behind the proposed framework is to perform NMF, while ensuring that the margin between the projected data of the two classes is maximal. The concurrent NMF factorization and support vector optimization are performed through a set of multiplicative update rules. In the same context, the maximum margin classification constraints are imposed on the NMF problem with additional discriminant constraints and respective multiplicative update rules are extracted. The impact of the maximum margin classification constraints on the NMF factorization problem is addressed in Section VI. Experimental results in several databases indicate that the incorporation of the maximum margin classification constraints into the NMF and discriminant NMF objective functions improves the accuracy of the classification.

23 citations


Proceedings ArticleDOI
26 May 2013
TL;DR: A novel algorithm is proposed, that performs tracking of rigid objects in 3D videos, without knowledge of the camera calibration parameters, by exploiting only visual information obtained from the left and right video channels, namely luminance and disparity information.
Abstract: A novel algorithm is proposed, that performs tracking of rigid objects in 3D videos, without knowledge of the camera calibration parameters, by exploiting only visual information obtained from the left and right video channels, namely luminance and disparity information. The proposed algorithm exploits noisy disparity maps that have been extracted by a real-time disparity estimation algorithm. The algorithm employs two appearance-based representation methods for describing the object texture. The first one combines luminance with disparity information and the second one employs Local Steering Kernel (LSK) descriptors.

Proceedings ArticleDOI
16 Apr 2013
TL;DR: Fuzzy Vector Quantization is applied to the human body poses appearing in a video in order to obtain a compact video representation, that will be used for person identification and action recognition.
Abstract: In this paper, we propose a person identification method exploiting human motion information. A Self Organizing Neural Network is employed in order to determine a topographic map of representative human body poses. Fuzzy Vector Quantization is applied to the human body poses appearing in a video in order to obtain a compact video representation, that will be used for person identification and action recognition. Two feedforward Artificial Neural Networks are trained to recognize the person ID and action class labels of a given test action video. Network outputs combination, based on another feedforward network, is performed in the case of multiple cameras used in the training and identification phases. Experimental results on two publicly available databases evaluate the performance of the proposed person identification approach.

Journal ArticleDOI
TL;DR: A novel Support Vector Machine (SVM) variant, which makes use of robust statistics, is proposed, and it performs better than other SVM variants, especially in cases where the training data contain outliers.

Journal ArticleDOI
TL;DR: This work presents a novel method for analyzing FISH images based on the statistical properties of Radial Basis Functions, and evaluated on a data set of 100 breast carcinoma cases provided by the Aristotle University of Thessaloniki and the University of Pisa, with promising results.

Journal ArticleDOI
TL;DR: The performance of the proposed human action recognition method is evaluated on two publicly available action recognition databases aiming at different application scenarios.

Proceedings ArticleDOI
04 Apr 2013
TL;DR: A comparative study of the discriminative ability of different actions for person identification is provided, denoting that several actions, except walk, can be exploited for person Identification.
Abstract: In this paper we present a view-independent person identification method exploiting motion information. A multi-camera setup is used in order to capture the human body during action execution from different viewing angles. The method is able to incorporate several everyday actions in person identification. A comparative study of the discriminative ability of different actions for person identification is provided, denoting that several actions, except walk, can be exploited for person identification.

Proceedings ArticleDOI
01 Sep 2013
TL;DR: The proposed active classification method provides enhanced classification performance in two publicly available action recognition databases.
Abstract: In this paper, we propose a novel classification method involving two processing steps. Given a test sample, the training data residing to its neighborhood are determined. Classification is performed by a Single-hidden Layer Feedforward Neural network exploiting labeling information of the training data appearing in the test sample neighborhood and using the rest training data as unlabeled. By following this approach, the proposed classification method focuses the classification problem on the training data that are more similar to the test sample under consideration and exploits information concerning to the training set structure. Compared to both static classification exploiting all the available training data and dynamic classification involving data selection for classification, the proposed active classification method provides enhanced classification performance in two publicly available action recognition databases.

Proceedings ArticleDOI
10 Jun 2013
TL;DR: A novel view-independent human action recognition method is proposed that effectively addresses the camera viewpoint identification problem, i.e., the identification of the position of each camera with respect to the person's body.
Abstract: In this paper a novel view-independent human action recognition method is proposed. A multi-camera setup is used to capture the human body from different viewing angles. Actions are described by a novel action representation, the so-called multi-view action image (MVAI), which effectively addresses the camera viewpoint identification problem, i.e., the identification of the position of each camera with respect to the person's body. Linear Discriminant Analysis is applied on the MVAIs in order to to map actions to a discriminant feature space where actions are classified by using a simple nearest class centroid classification scheme. Experimental results denote the effectiveness of the proposed action recognition approach.

Proceedings ArticleDOI
16 Oct 2013
TL;DR: This paper proposes an optimization scheme aiming at determining the optimal subclass representation for CDA-based data projection, and has been evaluated on standard classification problems, as well as on two publicly available human action recognition databases providing enhanced class discrimination.
Abstract: Clustering-based Discriminant Analysis (CDA) is a well-known technique for supervised feature extraction and dimensionality reduction. CDA determines an optimal discriminant subspace for linear data projection based on the assumptions of normal subclass distributions and subclass representation by using the mean subclass vector. However, in several cases, there might be other subclass representative vectors that could be more discriminative, compared to the mean subclass vectors. In this paper we propose an optimization scheme aiming at determining the optimal subclass representation for CDA-based data projection. The proposed optimization scheme has been evaluated on standard classification problems, as well as on two publicly available human action recognition databases providing enhanced class discrimination, compared to the standard CDA approach.

Proceedings ArticleDOI
16 Apr 2013
TL;DR: A method for performing semiautomatic identity label annotation on facial images, obtained from monocular and stereoscopic videos is introduced to be used by archivists for semi-automatic annotation of television content, in order to further enable journalists to directly access video shots/frames of interest.
Abstract: In this paper, a method for performing semiautomatic identity label annotation on facial images, obtained from monocular and stereoscopic videos is introduced. The proposed method exploits prior information for the data structure, obtained from the application of a clustering algorithm, for the selection of the facial images from which label inference should begin. Then, a sparse graph is constructed according to the Linear Neighborhood Propagation (LNP) method and, finally, label inference is performed according to an iterative update rule. In the case of stereoscopic videos, the classification decision is determined by the combined information of the left and right channels. The objective of the proposed framework is to be used by archivists for semi-automatic annotation of television content, in order to further enable journalists to directly access video shots/frames of interest.

Proceedings ArticleDOI
10 Jun 2013
TL;DR: This paper proposes two novel algorithms that exploit available disparity information to detect two disturbing stereoscopic issues, namely depth jump cuts and bent window effects.
Abstract: 3DTV and 3D cinema witness a significant increase in their popularity nowadays. New movie titles are released in 3D and there are more than 35 TV channels in various countries that broadcast in 3D worldwide. It is well known today and becomes more obvious, as the 3D video content availability increases, that stereoscopy is associated with certain 3D video quality issues that may affect in a negative way the 3D viewing experience. In this paper, we propose two novel algorithms that exploit available disparity information to detect two disturbing stereoscopic issues, namely depth jump cuts and bent window effects. Representative examples are provided to assess the algorithms performance. The proposed algorithms can be helpful in the post-production stage, where, in most cases, the detected issues can be fixed, and also in assessing the overall quality of stereoscopic video content.

Proceedings ArticleDOI
09 Sep 2013
TL;DR: A novel variant of the Normalized Nut (N-Cut) clustering algorithm that incorporates imposed constraints is implemented and evaluated on facial image clustering for 3D video analysis.
Abstract: In this paper a novel variant of the Normalized Nut (N-Cut) clustering algorithm that incorporates imposed constraints is implemented and evaluated on facial image clustering for 3D video analysis. The clustering problem is seen as a graph cut problem through a similarity matrix representing the relation among the vertices, i.e. facial images in this work. Mutual Information is used as similarity metric, applied on the HSV color space of the original images. This work considers the incorporation of constraints either regarding similarity or dissimilarity derived from a priori available information in the clustering procedure and evaluates the performance increase by their use. Experiments are conducted on 3D videos where a priori information about the facial images exists.

Proceedings ArticleDOI
04 Apr 2013
TL;DR: A method that performs semantic labeling of movement direction in stereo videos along the horizontal and vertical axes and also along the depth axis when disparity information is available is described.
Abstract: The use of a stereo camera in surveillance applications adds information about the in depth position of the object being monitored. Thus, stereo videos can help extract semantic information about a person or object movement direction, not only on the image plane, but also in depth space. This paper describes a method that performs semantic labeling of movement direction in stereo videos along the horizontal and vertical axes and also along the depth axis when disparity information is available. A method that that extracts information about whether two or more objects are approaching or moving away is also presented.

Proceedings Article
01 Sep 2013
TL;DR: The proposed dynamic classification scheme has been applied to human action recognition by employing the Bag of Visual Words (BoVW)-based action video representation providing enhanced classification performance compared to the static classification approach.
Abstract: In this paper we present a dynamic classification scheme involving Single-hidden Layer Feedforward Neural (SLFN) network-based non-linear data mapping and test sample-specific labeled data selection in multiple levels The number of levels is dynamically determined by the test sample under consideration, while the use of Extreme Learning Machine (ELM) algorithm for SLFN network training leads to fast operation The proposed dynamic classification scheme has been applied to human action recognition by employing the Bag of Visual Words (BoVW)-based action video representation providing enhanced classification performance compared to the static classification approach

Proceedings ArticleDOI
10 Jun 2013
TL;DR: A general Bayesian post-processing methodology for performance improvement of object tracking in stereo video sequences is proposed in this paper and the improvements introduced by the proposed methodology in terms of tracking accuracy are quantified through experimental analysis.
Abstract: A general Bayesian post-processing methodology for performance improvement of object tracking in stereo video sequences is proposed in this paper. We utilize the results of any single channel visual object tracker in a Bayesian framework, in order to refine the tracking accuracy in both stereo video channels. In this framework, a variational Bayesian algorithm is employed, where prior knowledge about the object displacement (movement) is incorporated via a prior distribution. This displacement information is obtained in a preprocessing step, where object displacement is estimated via feature extraction and matching. In parallel, disparity information is extracted and utilized in the same framework. The improvements introduced by the proposed methodology in terms of tracking accuracy are quantified through experimental analysis.

Proceedings ArticleDOI
01 Sep 2013
TL;DR: Two novel algorithms are proposed that exploit available disparity information, in order to detect two disturbing stereoscopic issues, namely Stereoscopic Window Violations (SWV) and bent window effects.
Abstract: 3DTV and 3D cinema have become quite popular during the last few years. It is now well understood that certain 3D video quality issues may have a negative effect in the 3D viewing experience. In this paper, we propose two novel algorithms that exploit available disparity information, in order to detect two disturbing stereoscopic issues, namely Stereoscopic Window Violations (SWV) and bent window effects. The algorithms' performance is tested on a number of examples. The proposed algorithms can be used for assessing the overall quality of stereoscopic video content or in order to enable fixing the detected issues in a post-production stage.

Proceedings ArticleDOI
01 Jul 2013
TL;DR: A game with a purpose, called Erasitechnis GWAP, has been designed and deployed on Facebook as an enjoyable way encouraging users to enroll and provide music annotations, which focuses on tagging Greek folk music promoting less frequent tags.
Abstract: A web content management system (Web CMS) has been developed to host a corpus of 405 folk songs for media sharing. It is based on Drupal 7. Exploring various modules, the system offers multiple functionalities, including user registration, layered information about each song in different pages, similar content search, friendship relationships between the registered users, favorite lists, etc. Users can visit song pages, listen to the songs, and provide feedback. In addition to the Web CMS, a game with a purpose (GWAP), called Erasitechnis GWAP, has been designed and deployed on Facebook as an enjoyable way encouraging users to enroll and provide music annotations. Erasitechnis GWAP is a combined approach of existing games with a purpose, which focuses on tagging Greek folk music promoting less frequent tags. By doing so, descriptive tags are collected enabling the creation of a fully annotated dataset, which is exploited to train recommendation or autotagging systems.

Proceedings ArticleDOI
04 Apr 2013
TL;DR: In this paper spectral clustering techniques are implemented and evaluated on image clustering for single channel and 3D video analysis and extended to stereo video.
Abstract: In this paper spectral clustering techniques are implemented and evaluated on image clustering for single channel and 3D video analysis. The main idea is to use mutual information to create a similarity matrix for image pairs and then apply spectral clustering. Then, spectral clustering techniques can be used for image clustering. Such clustering techniques are then extended to stereo video. The application at hand includes facial image clustering on single view and stereo videos facial images.

Proceedings Article
01 Sep 2013
TL;DR: A novel dimensionality reduction method is presented which aims to identify a low dimensional projection subspace, where samples form classes that are better discriminated and separated with maximum margin, and has been applied for facial expression recognition in Cohn-Kanade database verifying its superiority in this task.
Abstract: We present a novel dimensionality reduction method which aims to identify a low dimensional projection subspace, where samples form classes that are better discriminated and separated with maximum margin. The proposed method brings certain advantages, both to data embedding and classification. It improves classification performance, reduces the required training time of the SVM classifier, since it is trained over the projected low dimensional samples and also data outliers and the overall data samples distribution inside classes do not affect its performance. The proposed method has been applied for facial expression recognition in Cohn-Kanade database verifying its superiority in this task, against other state-of-the-art dimensionality reduction techniques.

Proceedings ArticleDOI
01 Dec 2013
TL;DR: The purpose of this paper is to propose algorithms for semantically characterizing the motion of an object or groups of objects along any of the X, Y, Z axes.
Abstract: The efficient search and retrieval of the increasing volume of stereoscopic videos drives the need for the semantic description of its content. The derivation of disparity (depth) information from stereoscopic content allows the extraction of semantic information that is inherent to 3D. The purpose of this paper is to propose algorithms for semantically characterizing the motion of an object or groups of objects along any of the X, Y, Z axes. Experimental results are also provided.