Gi Pyo Nam
Bio: Gi Pyo Nam is an academic researcher from Korea Institute of Science and Technology. The author has contributed to research in topics: Facial recognition system & Facial expression. The author has an hindex of 5, co-authored 13 publications receiving 48 citations.
TL;DR: The proposed method to generate meaningful and smooth synopsis of long-duration videos according to the users’ query is superior to the existing techniques and it produces visually seamless video synopsis.
Abstract: Synopsis of a long-duration video has many applications in intelligent transportation systems. It can help to monitor traffic with lesser manpower. However, generating meaningful synopsis of a long-duration video recording can be challenging. Often summarized outputs include redundant contents or activities that may not be helpful to the observer. Moving object trajectories are possible sources of information that can be used to generate the synopsis of long-duration videos. The synopsis generation faces challenges due to object tracking, grouping of the trajectories with respect to activity type, object category, and contextual information, and generating smooth synopsis according to a query. In this paper, we propose a method to generate meaningful and smooth synopsis of long-duration videos according to the users’ query. We have tracked moving objects and adopted deep learning to classify the objects into known categories (e.g., car, bike, and pedestrians). We then identify regions in the surveillance scene with the help of unsupervised clustering. Each tube (spatiotemporal object trajectory) is represented by the source and the destination. In the final stage, we take a query from the user and generate the synopsis video by smoothly blending the appropriate tubes over the background frame through energy minimization. The proposed method has been evaluated on two publicly available datasets and our own surveillance datasets. We have compared the method with popular state-of-the-art techniques. The experiments reveal that the proposed method is superior to the existing techniques and it produces visually seamless video synopsis.
TL;DR: PSI-CNN is proposed, a generic pyramid-based scale-invariant CNN architecture which additionally extracts untrained feature maps across multiple image resolutions, thereby allowing the network to learn scale-independent information and improving the recognition performance on low resolution images.
Abstract: Face recognition is one research area that has benefited from the recent popularity of deep learning, namely the convolutional neural network (CNN) model. Nevertheless, the recognition performance is still compromised by the model’s dependency on the scale of input images and the limited number of feature maps in each layer of the network. To circumvent these issues, we propose PSI-CNN, a generic pyramid-based scale-invariant CNN architecture which additionally extracts untrained feature maps across multiple image resolutions, thereby allowing the network to learn scale-independent information and improving the recognition performance on low resolution images. Experimental results on the LFW dataset and our own CCTV database show PSI-CNN consistently outperforming the widely-adopted VGG face model in terms of face matching accuracy.
04 Jun 2019
University of Ljubljana1, Korea Institute of Science and Technology2, Indian Institute of Technology Bombay3, Istanbul Technical University4, University of Science and Technology Beijing5, National Institute of Technology, Rourkela6, Warsaw University of Technology7, University of South Florida8, Federal University of Bahia9
TL;DR: This analysis shows that methods incorporating deep learning models clearly outperform techniques relying solely on hand-crafted descriptors, even though both groups of techniques exhibit similar behavior when it comes to robustness to various covariates, such as presence of occlusions, changes in (head) pose, or variability in image resolution.
Abstract: This paper presents a summary of the 2019 Unconstrained Ear Recognition Challenge (UERC), the second in a series of group benchmarking efforts centered around the problem of person recognition from ear images captured in uncontrolled settings. The goal of the challenge is to assess the performance of existing ear recognition techniques on a challenging large-scale ear dataset and to analyze performance of the technology from various viewpoints, such as generalization abilities to unseen data characteristics, sensitivity to rotations, occlusions and image resolution and performance bias on sub-groups of subjects, selected based on demographic criteria, i.e. gender and ethnicity. Research groups from 12 institutions entered the competition and submitted a total of 13 recognition approaches ranging from descriptor-based methods to deep-learning models. The majority of submissions focused on ensemble based methods combining either representations from multiple deep models or hand-crafted with learned image descriptors. Our analysis shows that methods incorporating deep learning models clearly outperform techniques relying solely on hand-crafted descriptors, even though both groups of techniques exhibit similar behavior when it comes to robustness to various covariates, such presence of occlusions, changes in (head) pose, or variability in image resolution. The results of the challenge also show that there has been considerable progress since the first UERC in 2017, but that there is still ample room for further research in this area.
TL;DR: The results indicate that DEFace can detect the face region more accurately in comparison to the other state-of-the-art methods while maintaining the processing time.
Abstract: This study proposes a novel face detector called DEFace that focuses on the challenging tasks of face detection to cope with a small size that is under 12 pixels and occlusions due to a mask or human body parts. This study proposed the extended feature pyramid network (FPN) module to detect small faces by expanding the range of P layer, and the network by adding a receptive context module (RCM) after each predicted feature head from the top-down pathway in the FPN architecture to enhance the feature discriminability and the robustness. Based on the FPN principle, the combination between the low- and high-resolutions are beneficial for object detection with different object sizes. Furthermore, with assistance from the RCM, the proposed method can use a broad range of context information especially for small faces. To evaluate the performance of the proposed method, various public face datasets are used such as the WIDER Face dataset, the face detection dataset and benchmark (FDDB), and the masked faces (MAFA) dataset, which consist of challenging samples such as small face regions and occlusions by hair or other people. The results indicate that DEFace can detect the face region more accurately in comparison to the other state-of-the-art methods while maintaining the processing time.
TL;DR: A novel framework for automatically assessing facial attractiveness that considers four ratio feature sets as objective elements of facial attractiveness and three regression-based predictors to estimate a facial beauty score is presented.
Abstract: In this paper, we present a novel framework for automatically assessing facial attractiveness that considers four ratio feature sets as objective elements of facial attractiveness. In our framework, these feature sets are combined with three regression-based predictors to estimate a facial beauty score. To enhance the system’s performance to make it comparable with human scoring, we apply a score fusion technique. Experimental results show that the attractiveness score obtained by the proposed framework better correlates with human assessments than the scores from other predictors. The framework’s modularity allows any features or predictors to be integrated into the facial attractiveness measure. Our proposed framework can be applied to many beauty-related fields, such as the plastic surgery, cosmetics, and entertainment industries.
01 Jan 1999
TL;DR: The result showed that the pre-trained CNN with multi-class support vector machine (SVM) classifier achieved a higher accuracy compared to most of the state-of-the-art models, and this was obtained with an improvement in recognition accuracy up to 39%.
Abstract: Face recognition (FR) is defined as the process through which people are identified using facial images. This technology is applied broadly in biometrics, security information, accessing controlled areas, keeping of the law by different enforcement bodies, smart cards, and surveillance technology. The facial recognition system is built using two steps. The first step is a process through which the facial features are picked up or extracted, and the second step is pattern classification. Deep learning, specifically the convolutional neural network (CNN), has recently made commendable progress in FR technology. This paper investigates the performance of the pre-trained CNN with multi-class support vector machine (SVM) classifier and the performance of transfer learning using the AlexNet model to perform classification. The study considers CNN architecture, which has so far recorded the best outcome in the ImageNet Large-Scale Visual Recognition Challenge (ILSVRC) in the past years, more specifically, AlexNet and ResNet-50. In order to determine performance optimization of the CNN algorithm, recognition accuracy was used as a determinant. Improved classification rates were seen in the comprehensive experiments that were completed on the various datasets of ORL, GTAV face, Georgia Tech face, labelled faces in the wild (LFW), frontalized labeled faces in the wild (F_LFW), YouTube face, and FEI faces. The result showed that our model achieved a higher accuracy compared to most of the state-of-the-art models. An accuracy range of 94% to 100% for models with all databases was obtained. Also, this was obtained with an improvement in recognition accuracy up to 39%.
01 Jan 2014
TL;DR: A novel system for ear recognition based on ensembles of deep CNN-based models and more specifically the Visual Geometry Group (VGG)-like network architectures for extracting discriminative deep features from ear images with significant improvements over the recently published results.
Abstract: The recognition performance of visual recognition systems is highly dependent on extracting and representing the discriminative characteristics of image data. Convolutional neural networks (CNNs) have shown unprecedented success in a variety of visual recognition tasks due to their capability to provide in-depth representations exploiting visual image features of appearance, color, and texture. This paper presents a novel system for ear recognition based on ensembles of deep CNN-based models and more specifically the Visual Geometry Group (VGG)-like network architectures for extracting discriminative deep features from ear images. We began by training different networks of increasing depth on ear images with random weight initialization. Then, we examined pretrained models as feature extractors as well as fine-tuning them on ear images. After that, we built ensembles of the best models to further improve the recognition performance. We evaluated the proposed ensembles through identification experiments using ear images acquired under controlled and uncontrolled conditions from mathematical analysis of images (AMI), AMI cropped (AMIC) (introduced here), and West Pomeranian University of Technology (WPUT) ear datasets. The experimental results indicate that our ensembles of models yield the best performance with significant improvements over the recently published results. Moreover, we provide visual explanations of the learned features by highlighting the relevant image regions utilized by the models for making decisions or predictions.