scispace - formally typeset
Search or ask a question

Showing papers by "Ioannis Pitas published in 2015"


Journal ArticleDOI
TL;DR: The connection of the kernel versions of the ELM classifier with infinite Single-hidden Layer Feedforward Neural networks is discussed and it is shown that the original ELM kernel definition can be adopted for the calculation of theELM kernel matrix for two of the most common activation functions.

145 citations


Journal ArticleDOI
TL;DR: An extension of the Extreme Learning Machine algorithm for Single-hidden Layer Feedforward Neural network training that incorporates Dropout and DropConnect regularization in its optimization process is proposed and it is shown that the adoption of such a regularization approach can lead to better solutions for the network output weights.

41 citations


Journal ArticleDOI
TL;DR: The proposed approach is tested on three problems relating to human behavior analysis: Face recognition, facial expression recognition, and human action recognition, since the proposed class-specific reference discriminant analysis outperforms kernel discriminantAnalysis, kernel spectral regression, and class- specific kernel discriminatory analysis, as well as support vector machine-based classification, in most cases.
Abstract: In this paper, a novel nonlinear subspace learning technique for class-specific data representation is proposed. A novel data representation is obtained by applying nonlinear class-specific data projection to a discriminant feature space, where the data belonging to the class under consideration are enforced to be close to their class representation, while the data belonging to the remaining classes are enforced to be as far as possible from it. A class is represented by an optimized class vector, enhancing class discrimination in the resulting feature space. An iterative optimization scheme is proposed to this end, where both the optimal nonlinear data projection and the optimal class representation are determined in each optimization step. The proposed approach is tested on three problems relating to human behavior analysis: Face recognition, facial expression recognition, and human action recognition. Experimental results denote the effectiveness of the proposed approach, since the proposed class-specific reference discriminant analysis outperforms kernel discriminant analysis, kernel spectral regression, and class-specific kernel discriminant analysis, as well as support vector machine-based classification, in most cases.

40 citations


Journal ArticleDOI
TL;DR: A distributed approach to the Kernel k-Means clustering algorithm is presented, in order to make its application to a large number of samples feasible and, thus, achieve high performance clustering results on very big datasets.

34 citations


Journal ArticleDOI
TL;DR: Methods for proper graph construction based on the structure of the available data and label inference methods for spreading label information from a few labeled data to a larger set of unlabeled data are reviewed.
Abstract: The expansion of the Internet over the last decade and the proliferation of online social communities, such as Facebook, Googlep, and Twitter, as well as multimedia sharing sites, such as YouTube, Flickr, and Picasa, has led to a vast increase of available information to the user. In the case of multimedia data, such as images and videos, fast querying and processing of the available information requires the annotation of the multimedia data with semantic descriptors, that is, labels. However, only a small proportion of the available data are labeled. The rest should undergo an annotation-labeling process. The necessity for the creation of automatic annotation algorithms gave birth to label propagation and semi-supervised learning. In this study, basic concepts in graph-based label propagation methods are discussed. Methods for proper graph construction based on the structure of the available data and label inference methods for spreading label information from a few labeled data to a larger set of unlabeled data are reviewed. Applications of label propagation algorithms in digital media, as well as evaluation metrics for measuring their performance, are presented.

31 citations


Reference BookDOI
22 Dec 2015
TL;DR: Graph-Based Social Media Analysis provides a comprehensive introduction to the use of graph analysis in the study of social and digital media and presents various approaches to storing vast amounts of data online and retrieving that data in real-time.
Abstract: Focused on the mathematical foundations of social media analysis, Graph-Based Social Media Analysis provides a comprehensive introduction to the use of graph analysis in the study of social and digital media. It addresses an important scientific and technological challenge, namely the confluence of graph analysis and network theory with linear algebra, digital media, machine learning, big data analysis, and signal processing. Supplying an overview of graph-based social media analysis, the book provides readers with a clear understanding of social media structure. It uses graph theory, particularly the algebraic description and analysis of graphs, in social media studies. The book emphasizes the big data aspects of social and digital media. It presents various approaches to storing vast amounts of data online and retrieving that data in real-time. It demystifies complex social media phenomena, such as information diffusion, marketing and recommendation systems in social media, and evolving systems. It also covers emerging trends, such as big data analysis and social media evolution. Describing how to conduct proper analysis of the social and digital media markets, the book provides insights into processing, storing, and visualizing big social media data and social graphs. It includes coverage of graphs in social and digital media, graph and hyper-graph fundamentals, mathematical foundations coming from linear algebra, algebraic graph analysis, graph clustering, community detection, graph matching, web search based on ranking, label propagation and diffusion in social media, graph-based pattern recognition and machine learning, graph-based pattern classification and dimensionality reduction, and much more. This book is an ideal reference for scientists and engineers working in social media and digital media production and distribution. It is also suitable for use as a textbook in undergraduate or graduate courses on digital media, social media, or social networks.

29 citations


Journal ArticleDOI
TL;DR: Experimental results on several face image databases show the effectiveness and robustness of LRSRC in face image recognition.
Abstract: Face recognition has attracted great interest due to its importance in many real-world applications. In this paper, we present a novel low-rank sparse representation-based classification (LRSRC) method for robust face recognition. Given a set of test samples, LRSRC seeks the lowest-rank and sparsest representation matrix over all training samples. Since low-rank model can reveal the subspace structures of data while sparsity helps to recognize the data class, the obtained test sample representations are both representative and discriminative. Using the representation vector of a test sample, LRSRC classifies the test sample into the class which generates minimal reconstruction error. Experimental results on several face image databases show the effectiveness and robustness of LRSRC in face image recognition.

22 citations


Journal ArticleDOI
TL;DR: A novel DR algorithm, which uses subclass discriminant information, called Subclass Marginal Fisher Analysis (SMFA) has been proposed and it is shown that SMFA outperforms in most of the cases the state-of-the-art demonstrating the efficacy and power of SGE as a platform to develop new methods.

21 citations


Proceedings ArticleDOI
10 Dec 2015
TL;DR: The proposed Approximate Kernel Extreme Learning Machine classifier is able to scale well in both time and memory, while achieving good generalization performance in large-scale nonlinear facial image classification problems.
Abstract: In this paper, we propose a scheme that can be used in large-scale nonlinear facial image classification problems. An approximate solution of the kernel Extreme Learning Machine classifier is formulated and evaluated. Experiments on two publicly available facial image datasets using two popular facial image representations illustrate the effectiveness and efficiency of the proposed approach. The proposed Approximate Kernel Extreme Learning Machine classifier is able to scale well in both time and memory, while achieving good generalization performance. Specifically, it is shown that it outperforms the standard ELM approach for the same time and memory requirements. Compared to the original kernel ELM approach, it achieves similar (or better) performance, while scaling well in both time and memory with respect to the training set cardinality.

18 citations


Proceedings ArticleDOI
28 Dec 2015
TL;DR: A novel, low-level video frame description method is proposed that is able to compactly capture informative image statistics from luminance, color and stereoscopic disparity video data, both in a global and in various local scales.
Abstract: A novel, low-level video frame description method is proposed that is able to compactly capture informative image statistics from luminance, color and stereoscopic disparity video data, both in a global and in various local scales. Thus, scene texture, illumination and geometry properties may succinctly be contained within a single frame feature descriptor, which can subsequently be employed as a building block in any key-frame extraction scheme, e.g., shot frame clustering. The computed key-frames are subsequently used to derive a movie summary in the form of a video skim, which is suitably post-processed to reduce stereoscopic video defects that cause visual fatigue and are a by-product of the summarization.

16 citations


Proceedings Article
01 Mar 2015
TL;DR: The human centered interface specifications and implementations for such a system, which can be supported by ambient intelligence and robotic technologies, are described and a multi-view eating and drinking activity recognition database that has been created in order to facilitate research towards this direction is described.
Abstract: Assisted living has a particular social importance in most developed societies, due to the increased life expectancy of the general population and the ensuing ageing problems. It has also importance for the provision of improved home care in cases of disabled persons or persons suffering from certain diseases that have high social impact. In this context, the development of computer vision systems capable to identify human eating and drinking activity can be really useful in order to prevent undernourishment/malnutrition and dehydration in a smart home environment targeting to extend independent living of older persons in the early stage of dementia. In this paper, we first describe the human centered interface specifications and implementations for such a system, which can be supported by ambient intelligence and robotic technologies. We, subsequently, describe a multi-view eating and drinking activity recognition database that has been created in order to facilitate research towards this direction. The database has been created by using four cameras in order to produce multi-view videos, each depicting one of twelve persons having a meal, resulting to a database size equal to 59.68 hours in total. Various types of meals have been recorded, i.e., breakfast, lunch and fast food. Moreover, the persons have different sizes, clothing and are of different sex. The database has been annotated in a frame base in terms of person ID and activity class. We hope that such a database will serve as a benchmark data set for computer vision researchers in order to devise methods targeting to this important application.

Proceedings ArticleDOI
19 Apr 2015
TL;DR: This paper proposes a method for video summarization based on human activity description that is able to outperform OC-SVM-based video segment selection and evaluates the proposed approach in three Hollywood movies.
Abstract: In this paper, we propose a method for video summarization based on human activity description. We formulate this problem as the one of automatic video segment selection based on a learning process that employs salient video segment paradigms. For this one-class classification problem, we introduce a novel variant of the One-Class Support Vector Machine (OC-SVM) classifier that exploits subclass information in the OC-SVM optimization problem, in order to jointly minimize the data dispersion within each subclass and determine the optimal decision function. We evaluate the proposed approach in three Hollywood movies, where the performance of the proposed SOC-SVM algorithm is compared with that of the OC-SVM. Experimental results denote that the proposed approach is able to outperform OC-SVM-based video segment selection.

Proceedings ArticleDOI
20 Aug 2015
TL;DR: This paper proposes a new scalable solution for the Least Squares One-Class Support Vector Machine classifier by following an approximate kernel approach and evaluated the proposed method in big data visual classification problems, where it is shown that it is able to achieve satisfactory performance, while significantly reducing the overall computational and memory costs.
Abstract: Large-scale multi-class classification problems involve an enormous amount of training data that make the application of classical non-linear classification algorithms difficult. In addition, such multi-class classification problems are usually formed by a considerable number of classes. This makes the application of the popular one-versus-rest binary classifiers fusion scheme adopted by most state-of-the-art approaches difficult. In this paper, in order to overcome the high computational cost of multi-class non-linear classification approaches, we adopt an ensemble of approximate non-linear one-class classifiers. To this end, we propose a new scalable solution for the Least Squares One-Class Support Vector Machine classifier by following an approximate kernel approach. We evaluated the proposed method in big data visual classification problems, where it is shown that it is able to achieve satisfactory performance, while significantly reducing the overall computational and memory costs.

Journal ArticleDOI
TL;DR: It is shown that the proposed regularizer is able to weight the dimensions of the ELM space according to the importance of the network's hidden layer weights, without imposing additional computational and memory costs in the network learning process.

Journal ArticleDOI
01 Sep 2015
TL;DR: The algorithm of Approximate Kernel k-Means has been proposed, which works using only a small part of the kernel matrix, which can be computed much faster than others.
Abstract: Kernel k-Means is a basis for many state of the art global clustering approaches. When the number of samples grows too big, however, it is extremely time-consuming to compute the entire kernel matrix and it is impossible to store it in the memory of a single computer. The algorithm of Approximate Kernel k-Means has been proposed, which works using only a small part of the kernel matrix. The computation of the kernel matrix, even a part of it, remains a significant bottleneck of the process. Some types of kernel, however, can be computed using matrix multiplication. Modern CPU architectures and computational optimization methods allow for very fast matrix multiplication, thus those types of kernel matrices can be computed much faster than others.

Proceedings ArticleDOI
04 May 2015
TL;DR: This paper proposes a method for de-identifying facial images that manipulates facial images so that humans can still recognize the individual or individuals in an image or video frame, but at the same time common automatic identification algorithms fail to do so.
Abstract: A major issue that arises from mass visual media distribution in modern video sharing, social media and cloud services, is the issue of privacy. Malicious users can use these services to track the actions of certain individuals and/or groups thus violating their privacy. As a result the need to hinder automatic facial image identification in images and videos arises. In this paper we propose a method for de-identifying facial images. Contrary to most de-identification methods, this method manipulates facial images so that humans can still recognize the individual or individuals in an image or video frame, but at the same time common automatic identification algorithms fail to do so. This is achieved by projecting the facial images on a hypersphere. From the conducted experiments it can be verified that this method is effective in reducing the classification accuracy under 10%. Furthermore, in the resulting images the subject can be identified by human viewers.

Proceedings ArticleDOI
10 Dec 2015
TL;DR: A novel algorithm for zeroing elements of the kernel matrix, thus trimming the matrix is proposed, which results in reduced memory complexity and improved clustering performance.
Abstract: The Kernel k-Means algorithm for clustering extends the classic k-Means clustering algorithm. It uses the kernel trick to implicitly calculate distances on a higher dimensional space, thus overcoming the classic algorithm's inability to handle data that are not linearly separable. Given a set of n elements to cluster, the n × n kernel matrix is calculated, which contains the dot products in the higher dimensional space of every possible combination of two elements. This matrix is then referenced to calculate the distance between an element and a cluster center, as per classic k-Means. In this paper, we propose a novel algorithm for zeroing elements of the kernel matrix, thus trimming the matrix, which results in reduced memory complexity and improved clustering performance.

Book ChapterDOI
13 Aug 2015
TL;DR: The results proved that the visual information can improve the speaker clustering accuracy and hence the diarization process.
Abstract: Multimodal clustering/diarization tries to answer the question “who spoke when” by using audio and visual information. Diarization consists of two steps, at first segmentation of the audio information and detection of the speech segments and then clustering of the speech segments to group the speakers. This task has been mainly studied on audiovisual data from meetings, news broadcasts or talk shows. In this paper, we use visual information to aid speaker clustering. We tested the proposed method in three full length movies, i.e. a scenario much more difficult than the ones used so far, where there is no certainty that speech segments and video appearances of actors will always overlap. The results proved that the visual information can improve the speaker clustering accuracy and hence the diarization process.

Journal ArticleDOI
TL;DR: A novel spectral clustering algorithm which combines two well-known algorithms: normalized cuts and spectral clusters is introduced which is successfully tested on three stereoscopic feature films and compared against the state-of-the-art.
Abstract: In this work, we are focusing on facial image clustering techniques applied on stereoscopic videos. We introduce a novel spectral clustering algorithm which combines two well-known algorithms: normalized cuts and spectral clustering. Furthermore, we introduce two approach for evaluating the similarities between facial images, one based on Mutual Information and other based on Local Binary Patterns, combined with facial fiducial points and an image registration procedure. Ways of exploring the extra information available in stereoscopic videos are also introduced. The proposed approaches are successfully tested on three stereoscopic feature films and compared against the state-of-the-art. Author-HighlightsWe developed a facial image clustering algorithm for stereoscopic videos.A double spectral analysis was used for performing the clustering.Features that were used included both global (Mutual Information based) and local (Local Binary Patterns).Facial image trajectory information was also used in clustering.Best results occurred for local features and multiple representative images per facial image trajectory.

Journal ArticleDOI
TL;DR: An analysis of the recently proposed sparse extreme learning machine (S-ELM) classifier and an optimization scheme that can be used to calculate the network output weights is described that can lead to enhanced performance.

Proceedings ArticleDOI
10 Dec 2015
TL;DR: Experiments on four publicly available action recognition data sets demonstrate that the proposed unified approach increases the discriminative ability of the obtained video representation, providing enhanced action classification performance.
Abstract: In this paper we propose a novel method for human action recognition, that unifies discriminative Bag of Words (BoW)-based video representation and discriminant subspace learning. An iterative optimization scheme is proposed for sequential discriminant BoWs-based action representation and code-book adaptation based on action discrimination in a reduced dimensionality feature space where action classes are better discriminated. Experiments on four publicly available action recognition data sets demonstrate that the proposed unified approach increases the discriminative ability of the obtained video representation, providing enhanced action classification performance.

Journal ArticleDOI
01 Jan 2015
TL;DR: Experimental results on three publicly available databases show that the proposed approach outperforms facial image classification based on a single facial representation and on other facial region combination schemes.
Abstract: In this paper, we investigate the effectiveness of the Extreme Learning Machine (ELM) network in facial image classification. In order to enhance performance, we exploit knowledge related to the human face structure. We train a multi-view ELM network by employing automatically created facial regions of interest to this end. By jointly learning the network parameters and optimized network output combination weights, each facial region appropriately contributes to the final classification result. Experimental results on three publicly available databases show that the proposed approach outperforms facial image classification based on a single facial representation and on other facial region combination schemes.


Journal ArticleDOI
TL;DR: Experimental results denote that the performance of the proposed distance-based classification schemes is comparable (or even better) to that of Support Vector Machine classifier (in both the linear and kernel cases) which is currently the standard choice for human action recognition.

Proceedings ArticleDOI
01 Dec 2015
TL;DR: GE has been extended in order to integrate subclass discriminant information resulting to the novel Subclass Graph Embedding (SGE) framework and it is shown that SGE comprises a generalization of the typical GE including subclass DR methods.
Abstract: Recently, subspace learning methods for Dimensionality Reduction (DR), like Subclass Discriminant Analysis (SDA) and Clustering-based Discriminant Analysis (CDA), which use subclass information for the discrimination between the data classes, have attracted much attention. In parallel, important work has been accomplished on Graph Embedding (GE), which is a general framework unifying several subspace learning techniques. In this paper, GE has been extended in order to integrate subclass discriminant information resulting to the novel Subclass Graph Embedding (SGE) framework. The kernelization of SGE is also presented. It is shown that SGE comprises a generalization of the typical GE including subclass DR methods. In this context, the theoretical link of SDA and CDA methods with SGE is established. The efficacy and power of SGE has been substantiated by comparing subclass DR methods versus a diversity of unimodal methods all pertaining to the SGE framework via a series of experiments on various real-world data.

Proceedings ArticleDOI
10 Dec 2015
TL;DR: A novel subspace learning technique is introduced for facial image analysis that takes into account the symmetry nature of facial images by properly incorporating a symmetry constraint into the objective function of the Two-Dimensional Linear Discriminant Analysis to determine symmetric projection vectors.
Abstract: In this paper a novel subspace learning technique is introduced for facial image analysis The proposed technique takes into account the symmetry nature of facial images This information is exploited by properly incorporating a symmetry constraint into the objective function of the Two-Dimensional Linear Discriminant Analysis (2DLDA) to determine symmetric projection vectors The performance of the proposed Symmetric Two-Dimensional Linear Discriminant Analysis was evaluated on real face recognition databases Experimental results highlight the superiority of the proposed technique in comparison to standard approach

Journal ArticleDOI
TL;DR: The main idea is to use the graph embedding framework for these techniques and, therefore, by formulating a new minimization problem to simultaneously optimize the kernel parameters and the projection vectors of the chosen dimensionality reduction method.
Abstract: In this paper, we propose a new method for kernel optimization in kernel-based dimensionality reduction techniques such as kernel principal component analysis and kernel discriminant analysis. The main idea is to use the graph embedding framework for these techniques and, therefore, by formulating a new minimization problem to simultaneously optimize the kernel parameters and the projection vectors of the chosen dimensionality reduction method. Experimental results are conducted in various datasets, varying from real-world publicly available databases for classification benchmarking to facial expressions and face recognition databases. Our proposed method outperforms other competing ones in classification performance. Moreover, our method provides a systematic way to deal with kernel parameters whose calculation was treated rather superficially so far and/or experimentally, in most of the cases.

Journal ArticleDOI
TL;DR: An extension of the Extreme Learning Machine algorithm that is able to exploit multiple action representations and scatter information in the corresponding ELM spaces for the calculation of the networks’ parameters and the determination of optimized network combination weights is proposed.
Abstract: In this paper, we employ multiple Single-hidden Layer Feedforward Neural Networks for multi-view action recognition. We propose an extension of the Extreme Learning Machine algorithm that is able to exploit multiple action representations and scatter information in the corresponding ELM spaces for the calculation of the networks’ parameters and the determination of optimized network combination weights. The proposed algorithm is evaluated by using two state-of-the-art action video representation approaches on five publicly available action recognition databases designed for different application scenarios. Experimental comparison of the proposed approach with three commonly used video representation combination approaches and relating classification schemes illustrates that ELM networks employing a supervised view combination scheme generally outperform those exploiting unsupervised combination approaches, as well as that the exploitation of scatter information in ELM-based neural network training enhances the network’s performance.

Proceedings ArticleDOI
01 Dec 2015
TL;DR: This paper presents a MapReduce based distributed implementation of Nearest Neighbor and ε-ball Kernel k-Means, a state of the art clustering algorithm which employs the kernel trick, in order to perform clustering on a higher dimensionality space, thus overcoming the limitations of classic k- means regarding the non linear separability of the input data.
Abstract: Data clustering is an unsupervised learning task that has found many applications in various scientific fields. The goal is to find subgroups of closely related data samples (clusters) in a set of unlabeled data. A classic clustering algorithm is the so-called k-Means. It is very popular, however, it is also unable to handle cases in which the clusters are not linearly separable. Kernel k-Means is a state of the art clustering algorithm, which employs the kernel trick, in order to perform clustering on a higher dimensionality space, thus overcoming the limitations of classic k-Means regarding the non linear separability of the input data. Kernel k-Means typically computes the kernel matrix, which contains the results of the kernel function for every possible sample combination. This matrix can be viewed as the weight matrix of a full graph, where the samples are the vertices and the edges are weighed according to the similarity between the samples they connect, according to the kernel function. In this context, it is possible to work on the Nearest Neighbor graph, where each sample is only connected to some of its closest samples, or only using information from samples that are sufficiently close to each other, referred to as e-ball. Doing so reduces the size of the kernel matrix and can provide improved clustering results. In this paper, we present a MapReduce based distributed implementation of Nearest Neighbor and e-ball Kernel k-Means.

Proceedings ArticleDOI
12 Nov 2015
TL;DR: Experimental results showed that the proposed label propagation approach achieves either competitive or better classification accuracy from the state of the art in all classification tasks.
Abstract: A novel method is introduced for label propagation on similarity tensors. The proposed method operates on data with multiple representations. A higher order similarity matrix is constructed for describing the relationship between the data representations in different modalities. Then, label propagation is performed on the above mentioned similarity matrix, by extending the state of the art label propagation method with local and global consistency to the case of higher order similarity graphs. The evaluation of the proposed method was performed on two classification tasks: person recognition on facial images extracted from three stereo movies and human action recognition on two data sets consisting of videos downloaded from YouTube. Experimental results showed that the proposed label propagation approach achieves either competitive or better classification accuracy from the state of the art in all classification tasks.