scispace - formally typeset
Search or ask a question

Showing papers by "Hazim Kemal Ekenel published in 2018"


Proceedings ArticleDOI
18 Jun 2018
TL;DR: In this article, an end-to-end network called Cycle-Dehaze, which does not require pairs of hazy and corresponding ground truth images for training, is presented.
Abstract: In this paper, we present an end-to-end network, called Cycle-Dehaze, for single image dehazing problem, which does not require pairs of hazy and corresponding ground truth images for training. That is, we train the network by feeding clean and hazy images in an unpaired manner. Moreover, the proposed approach does not rely on estimation of the atmospheric scattering model parameters. Our method enhances CycleGAN formulation by combining cycle-consistency and perceptual losses in order to improve the quality of textural information recovery and generate visually better haze-free images. Typically, deep learning models for dehazing take low resolution images as input and produce low resolution outputs. However, in the NTIRE 2018 challenge on single image dehazing, high resolution images were provided. Therefore, we apply bicubic downscaling. After obtaining low-resolution outputs from the network, we utilize the Laplacian pyramid to upscale the output images to the original resolution. We conduct experiments on NYU-Depth, I-HAZE, and O-HAZE datasets. Extensive experiments demonstrate that the proposed approach improves CycleGAN method both quantitatively and qualitatively.

301 citations


Journal ArticleDOI
TL;DR: In this paper, the authors investigate the influence of covariates related to image quality and model characteristics, and analyse their impact on the face verification performance of different deep CNN models using the Labelled Faces in the Wild dataset.
Abstract: Convolutional neural network (CNN) based approaches are the state of the art in various computer vision tasks including face recognition. Considerable research effort is currently being directed toward further improving CNNs by focusing on model architectures and training techniques. However, studies systematically exploring the strengths and weaknesses of existing deep models for face recognition are still relatively scarce. In this paper, we try to fill this gap and study the effects of different covariates on the verification performance of four recent CNN models using the Labelled Faces in the Wild dataset. Specifically, we investigate the influence of covariates related to image quality and model characteristics, and analyse their impact on the face verification performance of different deep CNN models. Based on comprehensive and rigorous experimentation, we identify the strengths and weaknesses of the deep learning models, and present key areas for potential future research. Our results indicate that high levels of noise, blur, missing pixels, and brightness have a detrimental effect on the verification performance of all models, whereas the impact of contrast changes and compression artefacts is limited. We find that the descriptor-computation strategy and colour information does not have a significant influence on performance.

146 citations


Posted Content
TL;DR: This paper presents an end-to-end network, called Cycle-Dehaze, for single image dehazing problem, which does not require pairs of hazy and corresponding ground truth images for training, and improves CycleGAN method both quantitatively and qualitatively.
Abstract: In this paper, we present an end-to-end network, called Cycle-Dehaze, for single image dehazing problem, which does not require pairs of hazy and corresponding ground truth images for training. That is, we train the network by feeding clean and hazy images in an unpaired manner. Moreover, the proposed approach does not rely on estimation of the atmospheric scattering model parameters. Our method enhances CycleGAN formulation by combining cycle-consistency and perceptual losses in order to improve the quality of textural information recovery and generate visually better haze-free images. Typically, deep learning models for dehazing take low resolution images as input and produce low resolution outputs. However, in the NTIRE 2018 challenge on single image dehazing, high resolution images were provided. Therefore, we apply bicubic downscaling. After obtaining low-resolution outputs from the network, we utilize the Laplacian pyramid to upscale the output images to the original resolution. We conduct experiments on NYU-Depth, I-HAZE, and O-HAZE datasets. Extensive experiments demonstrate that the proposed approach improves CycleGAN method both quantitatively and qualitatively.

111 citations


Journal ArticleDOI
TL;DR: This paper has shown the importance of domain adaptation, when deep convolutional neural network models are used for ear recognition, and collected a new ear dataset using the Multi-PIE face dataset, to enable domain adaptation.
Abstract: Here, the authors have extensively investigated the unconstrained ear recognition problem. The authors have first shown the importance of domain adaptation, when deep convolutional neural network (CNN) models are used for ear recognition. To enable domain adaptation, the authors have collected a new ear data set using the Multi-PIE face data set, which they named as Multi-PIE ear data set. The authors have analysed in depth the effect of ear image quality, for example, illumination and aspect ratio, on the classification performance. Finally, the authors have addressed the problem of data set bias in the ear recognition field. Experiments on the UERC data set have shown that domain adaptation leads to a significant performance improvement. For example, when VGG-16 model is used and the domain adaptation is applied, an absolute increase of around 10% has been achieved. Combining different deep CNN models has further improved the accuracy by 4%. In the experiments that the authors have conducted to examine the data set bias, given an ear image, they were able to classify the data set that it has come from with 99.71% accuracy, which indicates a strong bias among the ear recognition data sets.

44 citations


Proceedings ArticleDOI
07 Jun 2018
TL;DR: In this paper, the authors presented a detailed analysis on extracting soft biometrie traits, age and gender, from ear images, using both geometric features and appearance-based features for ear representation.
Abstract: In this paper, we present a detailed analysis on extracting soft biometrie traits, age and gender, from ear images. Although there have been a few previous work on gender classification using ear images, to the best of our knowledge, this study is the first work on age classification from ear images. In the study, we have utilized both geometric features and appearance-based features for ear representation. The utilized geometric features are based on eight anthropometric landmarks and consist of 14 distance measurements and two area calculations. The appearance-based methods employ deep convolutional neural networks for representation and classification. The well-known convolutional neural network models, namely, AlexNet, VGG-16, GoogLeNet, and SqueezeNet have been adopted for the study. They have been fine-tuned on a large-scale ear dataset that has been built from the profile and close-to-profile face images in the Multi-PIE face dataset. This way, we have performed a domain adaptation. The updated models have been fine-tuned once more time on the small-scale target ear dataset, which contains only around 270 ear images for training. According to the experimental results, appearance-based methods have been found to be superior to the methods based on geometric features. We have achieved 94% accuracy for gender classification, whereas 52% accuracy has been obtained for age classification. These results indicate that ear images provide useful cues for age and gender classification, however, further work is required for age estimation.

20 citations


Proceedings ArticleDOI
02 May 2018
TL;DR: In this paper, the Maximally Stable Extremal Regions (MSTR) were used to acquire the text region candidates and then these possible regions were reduced in quantity by using geometric and stroke width properties.
Abstract: Text detection is one of the most challenging and commonly dealt applications in computer vision. Detecting text regions is the first step of the text recognition systems called Optical Character Recognition. This process requires the separation of text region from non-text region. In this paper, we utilize Maximally Stable Extremal Regions to acquire very first text region candidates. Then these possible regions are reduced in quantity by using geometric and stroke width properties. Candidate regions are joined to obtain text groups. Finally, Tesseract Optical Character Recognition engine is utilized as the last step to eliminate non-text groups. We evaluated the proposed system on KAIST and ICDAR datasets for both natural images and computer-generated images. For natural images 82.7% precision and 52.0% f-accuracy; for computer-generated images 64.0% precision and 65.2% f-accuracy is achieved.

19 citations


Journal ArticleDOI
TL;DR: In this article, a group of approaches for covert biometric recognition in surveillance environments is presented, considering the adversity of the conditions where recognition should be carried out (e.g., poor resolution, bad lighting, off-pose and partially occluded data).
Abstract: Performing covert biometric recognition in surveillance environments has been regarded as a grand challenge, considering the adversity of the conditions where recognition should be carried out (e.g., poor resolution, bad lighting, off-pose and partially occluded data). This special issue compiles a group of approaches to this problem.

16 citations


Journal ArticleDOI
TL;DR: In this paper, a new ear dataset using the Multi-PIE face dataset, which is named as MultiPIE ear dataset, was collected to enable domain adaptation, and to improve the performance further, they have combined different deep convolutional neural network models.
Abstract: In this paper, we have extensively investigated the unconstrained ear recognition problem. We have first shown the importance of domain adaptation, when deep convolutional neural network models are used for ear recognition. To enable domain adaptation, we have collected a new ear dataset using the Multi-PIE face dataset, which we named as Multi-PIE ear dataset. To improve the performance further, we have combined different deep convolutional neural network models. We have analyzed in depth the effect of ear image quality, for example illumination and aspect ratio, on the classification performance. Finally, we have addressed the problem of dataset bias in the ear recognition field. Experiments on the UERC dataset have shown that domain adaptation leads to a significant performance improvement. For example, when VGG-16 model is used and the domain adaptation is applied, an absolute increase of around 10\% has been achieved. Combining different deep convolutional neural network models has further improved the accuracy by 4\%. It has also been observed that image quality has an influence on the results. In the experiments that we have conducted to examine the dataset bias, given an ear image, we were able to classify the dataset that it has come from with 99.71\% accuracy, which indicates a strong bias among the ear recognition datasets.

16 citations


Proceedings ArticleDOI
02 May 2018
TL;DR: Experimental results show that Gabor filters based initialization of the network has similar characteristics with model transfer and can be applied for transfer learning without using a pre-trained model.
Abstract: In transfer learning, for a given classification task, the learning from source domain into target domain is achieved by training/transferring a pre-trained network with data from target domain. During this process, a pre-trained network is a pre-requisite for transferring the knowledge from source domain into the target domain. In this study, to eliminate the need for such a pre-trained model, Gabor filters are utilized. In the proposed method, a Convolutional Neural Network is constructed by initializing its first convolutional layer, which represents the low-level features, such as corners and edges, with Gabor filters that have similar low-level characteristics. Experimental results on MNIST, CIFAR-10, and CIFAR-100 datasets show that Gabor filters based initialization of the network has similar characteristics with model transfer and can be applied for transfer learning without using a pre-trained model.

13 citations


Posted Content
TL;DR: It is indicated that ear images provide useful cues for age and gender classification, however, further work is required for age estimation.
Abstract: In this paper, we present a detailed analysis on extracting soft biometric traits, age and gender, from ear images. Although there have been a few previous work on gender classification using ear images, to the best of our knowledge, this study is the first work on age classification from ear images. In the study, we have utilized both geometric features and appearance-based features for ear representation. The utilized geometric features are based on eight anthropometric landmarks and consist of 14 distance measurements and two area calculations. The appearance-based methods employ deep convolutional neural networks for representation and classification. The well-known convolutional neural network models, namely, AlexNet, VGG-16, GoogLeNet, and SqueezeNet have been adopted for the study. They have been fine-tuned on a large-scale ear dataset that has been built from the profile and close-to-profile face images in the Multi-PIE face dataset. This way, we have performed a domain adaptation. The updated models have been fine-tuned once more time on the small-scale target ear dataset, which contains only around 270 ear images for training. According to the experimental results, appearance-based methods have been found to be superior to the methods based on geometric features. We have achieved 94\% accuracy for gender classification, whereas 52\% accuracy has been obtained for age classification. These results indicate that ear images provide useful cues for age and gender classification, however, further work is required for age estimation.

9 citations


Journal ArticleDOI
TL;DR: Computer vision-based game design for physical exercise under two distinctive difficulty levels is presented and changes in the temperature and frequency content of BVP provide useful information to estimate the players’ engagement.
Abstract: Engagement is a key factor in gaming. Especially, in gamification applications, users’ engagement levels have to be assessed in order to determine the usability of the developed games. The authors first present computer vision-based game design for physical exercise. All games are played with gesture controls. The authors conduct user studies in order to evaluate the perception of the games using a game engagement questionnaire. Participants state that the games are interesting and they want to play them again. Next, as a use case, the authors integrate one of these games into a robot-assisted rehabilitation system. The authors perform additional user studies by employing self-assessment manikin to assess the difficulty levels that can range from boredom to excitement. The authors observe that with the increasing difficulty level, users’ arousal increases. Additionally, the authors perform psychophysiological signal analysis of the participants during the execution of the game under two distinctive difficulty levels. The authors derive features from the signals obtained from blood volume pulse (BVP), skin conductance, and skin temperature sensors. As a result of analysis of variance and sequential forward selection, the authors find that changes in the temperature and frequency content of BVP provide useful information to estimate the players’ engagement.

Proceedings ArticleDOI
01 Sep 2018
TL;DR: Experimental results show that pose normalization improves the performance for cross-pose facial expression recognition, especially, when local binary patterns in combination with support vector machine classifier is used and the obtained performance increase is significant.
Abstract: In this paper, we have explored the effect of pose normalization for cross-pose facial expression recognition. We have first presented an expression preserving face frontalization method. After face frontalization step, for facial expression representation and classification, we have employed both a traditional approach, by using hand-crafted features, namely local binary patterns, in combination with support vector machine classification and a relatively more recent approach based on convolutional neural networks. To evaluate the impact of face frontalization on facial expression recognition performance, we have conducted cross-pose, subject-independent expression recognition experiments using the BU3DFE database. Experimental results show that pose normalization improves the performance for cross-pose facial expression recognition. Especially, when local binary patterns in combination with support vector machine classifier is used, since this facial expression representation and classification does not handle pose variations, the obtained performance increase is significant. Convolutional neural networks-based approach is found to be more successful handling pose variations, when it is fine-tuned on a dataset that contains face images with varying pose angles. Its performance is further enhanced by benefiting from face frontalization.

Proceedings ArticleDOI
02 May 2018
TL;DR: The proposed system, which utilizes VGG-16 network model and performs two-stage fine-tuning, outperforms the previous state-of-the-art approaches on the TU Berlin sketch dataset by reaching 79,72% accuracy.
Abstract: Sketch classification problem is challenging due to several reasons, such as absence of color and texture information, lack of detailed information of objects, and the quality, which depends on drawing ability of the person. In this study, sketch classification problem is addressed by using deep convolutional neural network models. Specifically, the effect of domain adaptation is examined, when fine-tuning the convolutional neural networks for sketch classification. By employing domain adaptation, the classification accuracy is increased by around 3%. The proposed system, which utilizes VGG-16 network model and performs two-stage fine-tuning, outperforms the previous state-of-the-art approaches on the TU Berlin sketch dataset by reaching 79,72% accuracy.

Proceedings ArticleDOI
02 May 2018
TL;DR: The proposed method significantly outperforms the Rank-1, Rank-5 identification rates, and Area Under the Curve of Cumulative Match Score of the best approach at the ICB-RW 2016 challenge, which were 69.8%, 85.3%, and 0.954, respectively.
Abstract: In this paper, we addressed the problem of face recognition under mismatched conditions. In the proposed system, for face representation, we leveraged the state-of-the-art deep learning models trained on the VGGFace2 dataset. More specifically, we used pretrained convolutional neural network models to extract 2048 dimensional feature vectors from face images of International Challenge on Biometric Recognition in the Wild dataset, shortly, ICB-RW 2016. In this challenge, the gallery images were collected under controlled, indoor studio settings, whereas probe images were acquired from outdoor surveillance cameras. For classification, we trained a nearest neighbor classifier using correlation as the distance metric. Experiments on the ICB-RW 2016 dataset have shown that the employed deep learning models that were trained on the VGGFace2 dataset provides superior performance. Even using a single model, compared to the ICB-RW 2016 winner system, around 15% absolute increase in Rank-1 correct classification rate has been achieved. Combining individual models at feature level has improved the performance further. The ensemble of four models achieved 91.8% Rank-1, 98.0% Rank-5 identification rate, and 0.997 Area Under the Curve of Cumulative Match Score on the probe set. The proposed method significantly outperforms the Rank-1, Rank-5 identification rates, and Area Under the Curve of Cumulative Match Score of the best approach at the ICB-RW 2016 challenge, which were 69.8%, 85.3%, and 0.954, respectively.


Proceedings ArticleDOI
02 May 2018
TL;DR: Q-learning approach is applied to Geometry Friends and a generalized circle agent for different types of environment is implemented and results show that with the proposed method game completion rate and completion times are improved compared to random agent.
Abstract: Reinforcement learning began to perform at human-level success in game intelligence after deep learning revolution. Geometry Friends is a puzzle game, where we can benefit from deep learning and expect to have successful game playing agents. In the game, agents are collecting targets in two dimensional environment and they try to overcome obstacles in the way. In this paper, Q-learning approach is applied to this game and a generalized circle agent for different types of environment is implemented. Agent is trained by giving only screen pixels as input via a Convolutional Neural Network. Experimental results show that with the proposed method game completion rate and completion times are improved compared to random agent.

Proceedings ArticleDOI
02 May 2018
TL;DR: To reduce the difference between the distributions of the utilized datasets in a cross-dataset setup, a cycle-consistent generative adversarial network based deep learning approach is proposed that makes source dataset and target dataset look more similar.
Abstract: Most of the studies that have been conducted on person re-identification utilizes a single dataset to train, validate, and test the proposed system. Although these subsets do not overlap, since they were collected under similar conditions, experimental results obtained from such a setup are not good indicators in terms of the generalizability of the developed systems. Therefore, to obtain a better measure for the generalization capability of the proposed systems, cross-dataset experimental setups would be more appropriate. In the cross-dataset setup, the developed systems are trained and validated on one dataset and then tested using another one. In this work, to reduce the difference between the distributions of the utilized datasets in a cross-dataset setup, we proposed a cycle-consistent generative adversarial network based deep learning approach. The proposed method makes source dataset and target dataset look more similar. In the experiments, Market-1501 dataset was used as the source and PRID2011 was used as the target dataset. In the experiments, by benefiting from the proposed domain adaptation method, superior results have been achieved.

Posted Content
TL;DR: This work introduces a new scene graph generation method called image-level attentional context modeling (ILAC), which comprises a single-stream network that iteratively refines the scene graph with a nested graph neural network.
Abstract: We introduce a new scene graph generation method called image-level attentional context modeling (ILAC). Our model includes an attentional graph network that effectively propagates contextual information across the graph using image-level features. Whereas previous works use an object-centric context, we build an image-level context agent to encode the scene properties. The proposed method comprises a single-stream network that iteratively refines the scene graph with a nested graph neural network. We demonstrate that our approach achieves competitive performance with the state-of-the-art for scene graph generation on the Visual Genome dataset, while requiring fewer parameters than other methods. We also show that ILAC can improve regular object detectors by incorporating relational image-level information.

Proceedings ArticleDOI
02 May 2018
TL;DR: The goal is to reach a face database of at least 3000 subjects, each having at least 100 video sequences, and it is believed that this database would become an invaluable resource for automatic facial image processing and analysis research, especially, for the approaches that aim at exploiting dynamic features.
Abstract: Facial image processing and analysis has numerous applications Recently, with deep learning based approaches, a significant performance improvement has been obtained An important factor affecting the performance of the deep learning-based systems is availability of large amount of data Therefore, recently, a couple of large-scale face datasets that contain still images of subjects have become publicly available However, a video face database that contains both a large number of subjects as well as a large number of samples per subject has not been available, yet In this study, to fulfill this need, Turkish TV series have been used, and preparation of a multi-view face sequence dataset has been focused In the study, state-of-the-art off-the-shelf tools have been utilized for face detection, face alignment, and face recognition and the steps we have followed to collect the database are presented Our goal is, upon completion of the processes, to reach a face database of at least 3000 subjects, each having at least 100 video sequences In the database, along with the identity labels, we also plan to generate age and gender labels We believe that this database would become an invaluable resource for automatic facial image processing and analysis research, especially, for the approaches that aim at exploiting dynamic features