scispace - formally typeset
Search or ask a question

Showing papers by "Hazim Kemal Ekenel published in 2017"


Journal ArticleDOI
TL;DR: The results indicate that high levels of noise, blur, missing pixels, and brightness have a detrimental effect on the verification performance of all models, whereas the impact of contrast changes and compression artifacts is limited.
Abstract: Deep convolutional neural networks (CNNs) based approaches are the state-of-the-art in various computer vision tasks, including face recognition. Considerable research effort is currently being directed towards further improving deep CNNs by focusing on more powerful model architectures and better learning techniques. However, studies systematically exploring the strengths and weaknesses of existing deep models for face recognition are still relatively scarce in the literature. In this paper, we try to fill this gap and study the effects of different covariates on the verification performance of four recent deep CNN models using the Labeled Faces in the Wild (LFW) dataset. Specifically, we investigate the influence of covariates related to: image quality -- blur, JPEG compression, occlusion, noise, image brightness, contrast, missing pixels; and model characteristics -- CNN architecture, color information, descriptor computation; and analyze their impact on the face verification performance of AlexNet, VGG-Face, GoogLeNet, and SqueezeNet. Based on comprehensive and rigorous experimentation, we identify the strengths and weaknesses of the deep learning models, and present key areas for potential future research. Our results indicate that high levels of noise, blur, missing pixels, and brightness have a detrimental effect on the verification performance of all models, whereas the impact of contrast changes and compression artifacts is limited. It has been found that the descriptor computation strategy and color information does not have a significant influence on performance.

120 citations


Proceedings ArticleDOI
TL;DR: The top performer of the UERC was found to ensure robust performance on a smaller part of the dataset, but still exhibited a significant performance drop when the entire dataset comprising 3,704 subjects was used for testing.
Abstract: In this paper we present the results of the Unconstrained Ear Recognition Challenge (UERC), a group benchmarking effort centered around the problem of person recognition from ear images captured in uncontrolled conditions. The goal of the challenge was to assess the performance of existing ear recognition techniques on a challenging large-scale dataset and identify open problems that need to be addressed in the future. Five groups from three continents participated in the challenge and contributed six ear recognition techniques for the evaluation, while multiple baselines were made available for the challenge by the UERC organizers. A comprehensive analysis was conducted with all participating approaches addressing essential research questions pertaining to the sensitivity of the technology to head rotation, flipping, gallery size, large-scale recognition and others. The top performer of the UERC was found to ensure robust performance on a smaller part of the dataset (with 180 subjects) regardless of image characteristics, but still exhibited a significant performance drop when the entire dataset comprising 3,704 subjects was used for testing.

50 citations


Journal ArticleDOI
TL;DR: This work presents a novel face deidentification pipeline, which ensures anonymity by synthesizing artificial surrogate faces using generative neural networks (GNNs) to deidentify subjects in images or video, while preserving non-identity-related aspects of the data and consequently enabling data utilization.
Abstract: Face deidentification is an active topic amongst privacy and security researchers. Early deidentification methods relying on image blurring or pixelisation have been replaced in recent years with techniques based on formal anonymity models that provide privacy guaranties and retain certain characteristics of the data even after deidentification. The latter aspect is important, as it allows the deidentified data to be used in applications for which identity information is irrelevant. In this work, the authors present a novel face deidentification pipeline, which ensures anonymity by synthesising artificial surrogate faces using generative neural networks (GNNs). The generated faces are used to deidentify subjects in images or videos, while preserving non-identity-related aspects of the data and consequently enabling data utilisation. Since generative networks are highly adaptive and can utilise diverse parameters (pertaining to the appearance of the generated output in terms of facial expressions, gender, race etc.), they represent a natural choice for the problem of face deidentification. To demonstrate the feasibility of the authors’ approach, they perform experiments using automated recognition tools and human annotators. Their results show that the recognition performance on deidentified images is close to chance, suggesting that the deidentification process based on GNNs is effective.

50 citations


Book ChapterDOI
10 Jul 2017
TL;DR: A fully automated computer vision application for littering quantification based on images taken from the streets and sidewalks using a deep learning based framework to localize and classify different types of wastes.
Abstract: Littering quantification is an important step for improving cleanliness of cities. When human interpretation is too cumbersome or in some cases impossible, an objective index of cleanliness could reduce the littering by awareness actions. In this paper, we present a fully automated computer vision application for littering quantification based on images taken from the streets and sidewalks. We have employed a deep learning based framework to localize and classify different types of wastes. Since there was no waste dataset available, we built our acquisition system mounted on a vehicle. Collected images containing different types of wastes. These images are then annotated for training and benchmarking the developed system. Our results on real case scenarios show accurate detection of littering on variant backgrounds.

50 citations


Proceedings ArticleDOI
01 Aug 2017
TL;DR: In this article, LiDAR data is utilized to generate region proposals by processing the three dimensional point cloud that it provides and these candidate regions are then further processed by a state-of-the-art CNN classifier that has been fine-tuned for pedestrian detection.
Abstract: Pedestrian detection is an important component for safety of autonomous vehicles, as well as for traffic and street surveillance. There are extensive benchmarks on this topic and it has been shown to be a challenging problem when applied on real use-case scenarios. In purely image-based pedestrian detection approaches, the state-of-the-art results have been achieved with convolutional neural networks (CNN) and surprisingly few detection frameworks have been built upon multi-cue approaches. In this work, we develop a new pedestrian detector for autonomous vehicles that exploits LiDAR data, in addition to visual information. In the proposed approach, LiDAR data is utilized to generate region proposals by processing the three dimensional point cloud that it provides. These candidate regions are then further processed by a state-of-the-art CNN classifier that we have fine-tuned for pedestrian detection. We have extensively evaluated the proposed detection process on the KITTI dataset. The experimental results show that the proposed LiDAR space clustering approach provides a very efficient way of generating region proposals leading to higher recall rates and fewer misses for pedestrian detection. This indicates that LiDAR data can provide auxiliary information for CNN-based approaches.

41 citations


Journal ArticleDOI
TL;DR: In this paper, a face deidentification pipeline is presented, which ensures anonymity by synthesizing artificial surrogate faces using generative neural networks (GNNs), which are used to deidentify subjects in images or video, while preserving non-identity-related aspects of the data and consequently enabling data utilization.
Abstract: Face deidentification is an active topic amongst privacy and security researchers. Early deidentification methods relying on image blurring or pixelization were replaced in recent years with techniques based on formal anonymity models that provide privacy guaranties and at the same time aim at retaining certain characteristics of the data even after deidentification. The latter aspect is particularly important, as it allows to exploit the deidentified data in applications for which identity information is irrelevant. In this work we present a novel face deidentification pipeline, which ensures anonymity by synthesizing artificial surrogate faces using generative neural networks (GNNs). The generated faces are used to deidentify subjects in images or video, while preserving non-identity-related aspects of the data and consequently enabling data utilization. Since generative networks are very adaptive and can utilize a diverse set of parameters (pertaining to the appearance of the generated output in terms of facial expressions, gender, race, etc.), they represent a natural choice for the problem of face deidentification. To demonstrate the feasibility of our approach, we perform experiments using automated recognition tools and human annotators. Our results show that the recognition performance on deidentified images is close to chance, suggesting that the deidentification process based on GNNs is highly effective.

39 citations


Book ChapterDOI
TL;DR: In this paper, a fully automated computer vision application for littering quantification based on images taken from the streets and sidewalks is presented. But there was no waste dataset available, so they built their acquisition system mounted on a vehicle and collected images containing different types of wastes.
Abstract: Littering quantification is an important step for improving cleanliness of cities. When human interpretation is too cumbersome or in some cases impossible, an objective index of cleanliness could reduce the littering by awareness actions. In this paper, we present a fully automated computer vision application for littering quantification based on images taken from the streets and sidewalks. We have employed a deep learning based framework to localize and classify different types of wastes. Since there was no waste dataset available, we built our acquisition system mounted on a vehicle. Collected images containing different types of wastes. These images are then annotated for training and benchmarking the developed system. Our results on real case scenarios show accurate detection of littering on variant backgrounds.

16 citations


Posted Content
TL;DR: The Unconstrained Ear Recognition Challenge (UERC) as mentioned in this paper was a group benchmarking effort centered around the problem of person recognition from ear images captured in uncontrolled conditions, where the goal was to assess the performance of existing ear recognition techniques on a challenging large-scale dataset and identify open problems that need to be addressed in the future.
Abstract: In this paper we present the results of the Unconstrained Ear Recognition Challenge (UERC), a group benchmarking effort centered around the problem of person recognition from ear images captured in uncontrolled conditions. The goal of the challenge was to assess the performance of existing ear recognition techniques on a challenging large-scale dataset and identify open problems that need to be addressed in the future. Five groups from three continents participated in the challenge and contributed six ear recognition techniques for the evaluation, while multiple baselines were made available for the challenge by the UERC organizers. A comprehensive analysis was conducted with all participating approaches addressing essential research questions pertaining to the sensitivity of the technology to head rotation, flipping, gallery size, large-scale recognition and others. The top performer of the UERC was found to ensure robust performance on a smaller part of the dataset (with 180 subjects) regardless of image characteristics, but still exhibited a significant performance drop when the entire dataset comprising 3,704 subjects was used for testing.

15 citations


Proceedings ArticleDOI
01 Oct 2017
TL;DR: In this article, Gaussian Mixture Models (GMMs) are used to capture statistical relationships among convolution filters learned from a well-trained network and transfer this knowledge to another network.
Abstract: In this paper, we introduce a new regularization technique for transfer learning. The aim of the proposed approach is to capture statistical relationships among convolution filters learned from a well-trained network and transfer this knowledge to another network. Since convolution filters of the prevalent deep Convolutional Neural Network (CNN) models share a number of similar patterns, in order to speed up the learning procedure, we capture such correlations by Gaussian Mixture Models (GMMs) and transfer them using a regularization term. We have conducted extensive experiments on the CIFAR10, Places2, and CM-Places datasets to assess generalizability, task transferability, and cross-model transferability of the proposed approach, respectively. The experimental results show that the feature representations have efficiently been learned and transferred through the proposed statistical regularization scheme. Moreover, our method is an architecture independent approach, which is applicable for a variety of CNN architectures.

13 citations


Book ChapterDOI
TL;DR: This work shows that the proposed method has outperformed the baseline techniques applied to the OuluVS2 audiovisual database for phrase recognition with the frontal view cross-validation and testing sentence correctness reaching 79% and 73%, respectively, as compared to the baseline of 74% on cross- validation.
Abstract: Automatic visual speech recognition is an interesting problem in pattern recognition especially when audio data is noisy or not readily available. It is also a very challenging task mainly because of the lower amount of information in the visual articulations compared to the audible utterance. In this work, principle component analysis is applied to the image patches - extracted from the video data - to learn the weights of a two-stage convolutional network. Block histograms are then extracted as the unsupervised learning features. These features are employed to learn a recurrent neural network with a set of long short-term memory cells to obtain spatiotemporal features. Finally, the obtained features are used in a tandem GMM-HMM system for speech recognition. Our results show that the proposed method has outperformed the baseline techniques applied to the OuluVS2 audiovisual database for phrase recognition with the frontal view cross-validation and testing sentence correctness reaching 79% and 73%, respectively, as compared to the baseline of 74% on cross-validation.

5 citations


Proceedings ArticleDOI
01 May 2017
TL;DR: CerbB2 tumor scores were generated for the cell fragments were classified with high performance by the aid of convolutional neural networks (CNN).
Abstract: This study proposes a unique approach to classify CerbB2 tumor cell scores in breast cancer based on deep learning models. Another contribution of the study is the creation of a dataset from original breast cancer tissues. On the purpose of training, validating and testing with deep learning models cell fragments were generated from sample tissue images. CerbB2 tumor scores were generated for the cell fragments were classified with high performance by the aid of convolutional neural networks (CNN).

Posted Content
TL;DR: The experimental results show that the proposed LiDar space clustering approach provides a very efficient way of generating region proposals leading to higher recall rates and fewer misses for pedestrian detection, indicating that LiDAR data can provide auxiliary information for CNN-based approaches.
Abstract: Pedestrian detection is an important component for safety of autonomous vehicles, as well as for traffic and street surveillance. There are extensive benchmarks on this topic and it has been shown to be a challenging problem when applied on real use-case scenarios. In purely image-based pedestrian detection approaches, the state-of-the-art results have been achieved with convolutional neural networks (CNN) and surprisingly few detection frameworks have been built upon multi-cue approaches. In this work, we develop a new pedestrian detector for autonomous vehicles that exploits LiDAR data, in addition to visual information. In the proposed approach, LiDAR data is utilized to generate region proposals by processing the three dimensional point cloud that it provides. These candidate regions are then further processed by a state-of-the-art CNN classifier that we have fine-tuned for pedestrian detection. We have extensively evaluated the proposed detection process on the KITTI dataset. The experimental results show that the proposed LiDAR space clustering approach provides a very efficient way of generating region proposals leading to higher recall rates and fewer misses for pedestrian detection. This indicates that LiDAR data can provide auxiliary information for CNN-based approaches.

Proceedings ArticleDOI
15 May 2017
TL;DR: This study used state-of-the-art convolutional neural network models to satisfy the need for correctly labeled and classified documents for their need to be archived in an accessible manner.
Abstract: Despite the increase in digitization, the use of documents is still very common today. It is essential that these documents are correctly labeled and classified for their need to be archived in an accessible manner. In this study, we used state-of-the-art convolutional neural network models to satisfy this need. Convolutional Neural Networks achieve high performance compared to alternative methods in the field of classification, due to the strong and rich features they can learn from large data through deep architecture. For the experiments, we have used a dataset containing 400,000 images of 16 different document classes. The state-of-the-art deep learning models have been fine-tuned and compared in detail. VGG-16 architecture has achieved the best performance on this dataset with 90.93% correct classification rate.

Proceedings ArticleDOI
01 Apr 2017
TL;DR: This paper aims to distinguish whether the subject is under-challenged or over-Challenged using psychophysiological signal data collected from biofeedback sensors while executing the tasks with RehabRoby.
Abstract: Investigation into robot-assisted rehabilitation systems, and robot-assisted systems that are capable of detecting patient's emotions and then modifying the rehabilitation task to better suit the patients' abilities by taking account their emotions have gained momentum in recent years. In this paper, our aim is to distinguish whether the subject is under-challenged or over-challenged using psychophysiological signal data collected from biofeedback sensors while executing the tasks with RehabRoby. Initially, features are extracted from the physiological signals (Blood Volume Pulse (BVP), Skin Conductance (SC), and Skin Temperature (ST)). The extracted features are examined in terms of their contribution to the classification of the overstressed/over-challenged, boredom/under-challenged using variance analysis (ANOVA). The most significant features are selected, and various classification methods are used to classify overstressed/over-challenged, boredom/under-challenged.

Proceedings ArticleDOI
25 Aug 2017
TL;DR: Results show that the complementary information contained in recordings from different view angles improves the results significantly and the sentence correctness on the test set is increased from 76% for the highest performing single view to up to 83% when combining this view with the frontal and $60^\circ$ view angles.
Abstract: Visual speech recognition is a challenging research problem with a particular practical application of aiding audio speech recognition in noisy scenarios. Multiple camera setups can be beneficial for the visual speech recognition systems in terms of improved performance and robustness. In this paper, we explore this aspect and provide a comprehensive study on combining multiple views for visual speech recognition. The thorough analysis covers fusion of all possible view angle combinations both at feature level and decision level. The employed visual speech recognition system in this study extracts features through a PCA-based convolutional neural network, followed by an LSTM network. Finally, these features are processed in a tandem system, being fed into a GMM-HMM scheme. The decision fusion acts after this point by combining the Viterbi path log-likelihoods. The results show that the complementary information contained in recordings from different view angles improves the results significantly. For example, the sentence correctness on the test set is increased from 76% for the highest performing single view (30 degrees) to up to 83% when combining this view with the frontal and (60 degrees) view angles

Posted Content
TL;DR: In this article, a comprehensive study on combining multiple views for visual speech recognition is presented, which covers fusion of all possible view angle combinations both at feature level and decision level, and the results show that complementary information contained in recordings from different view angles improves the results significantly.
Abstract: Visual speech recognition is a challenging research problem with a particular practical application of aiding audio speech recognition in noisy scenarios. Multiple camera setups can be beneficial for the visual speech recognition systems in terms of improved performance and robustness. In this paper, we explore this aspect and provide a comprehensive study on combining multiple views for visual speech recognition. The thorough analysis covers fusion of all possible view angle combinations both at feature level and decision level. The employed visual speech recognition system in this study extracts features through a PCA-based convolutional neural network, followed by an LSTM network. Finally, these features are processed in a tandem system, being fed into a GMM-HMM scheme. The decision fusion acts after this point by combining the Viterbi path log-likelihoods. The results show that the complementary information contained in recordings from different view angles improves the results significantly. For example, the sentence correctness on the test set is increased from 76% for the highest performing single view ($30^\circ$) to up to 83% when combining this view with the frontal and $60^\circ$ view angles.

Proceedings ArticleDOI
01 May 2017
TL;DR: The Extended Cohn-Kanade (CK+) dataset which is commonly used for classification of facial expression is chosen and match and mismatch facial expressions are classified by using support vector machines to provide a baseline approach for the proposed pair matching formulation.
Abstract: In this study, facial expression recognition is defined as a pair matching problem. Our objectives to formulate this talk in this way are to be able to decide whether the facial expressions of the unlabeled images of two people are the same or different and to benefit from the proposed pair matching methods that have been studied for many years in the face recognition field. The Extended Cohn-Kanade (CK+) dataset which is commonly used for classification of facial expression is chosen to obtain match and mismatch pairs. To provide a baseline approach for the proposed pair matching formulation, in our paper, feature extraction by using local binary pattern is applied and match and mismatch facial expressions are classified by using support vector machines. 99.28% matching accuracy was achieved.

Posted Content
TL;DR: In this article, Gaussian Mixture Models (GMMs) are used to capture statistical relationships among convolution filters learned from a well-trained network and transfer this knowledge to another network.
Abstract: In this paper, we introduce a new regularization technique for transfer learning. The aim of the proposed approach is to capture statistical relationships among convolution filters learned from a well-trained network and transfer this knowledge to another network. Since convolution filters of the prevalent deep Convolutional Neural Network (CNN) models share a number of similar patterns, in order to speed up the learning procedure, we capture such correlations by Gaussian Mixture Models (GMMs) and transfer them using a regularization term. We have conducted extensive experiments on the CIFAR10, Places2, and CMPlaces datasets to assess generalizability, task transferability, and cross-model transferability of the proposed approach, respectively. The experimental results show that the feature representations have efficiently been learned and transferred through the proposed statistical regularization scheme. Moreover, our method is an architecture independent approach, which is applicable for a variety of CNN architectures.