scispace - formally typeset
Search or ask a question
Author

Yong-Guk Kim

Bio: Yong-Guk Kim is an academic researcher from Sejong University. The author has contributed to research in topics: Facial recognition system & Facial expression. The author has an hindex of 11, co-authored 83 publications receiving 463 citations. Previous affiliations of Yong-Guk Kim include Politehnica University of Timișoara & Smith-Kettlewell Institute.


Papers
More filters
Proceedings ArticleDOI
18 Jan 2016
TL;DR: A convolutional neural network is trained to categorize driver's gaze zone from a given face detected image using a multi-GPU platform, and its network parameters are transferred to a GPU within a PC running on Windows to operate in the real-time basis.
Abstract: This paper presents a study in which driver's gaze zone is categorized using new deep learning techniques. Since the sequence of gaze zones of a driver reflects precisely what and how he behaves, it allows us infer his drowsiness, focusing or distraction by analyzing the images coming from a camera. A Haar feature based face detector is combined with a correlation filter based MOSS tracker for the face detection task to handle a tough visual environment in the car. Driving database is a big-data which was constructed using a recording setup within a compact sedan by driving around the urban area. The gaze zones consist of 9 categories depending on where a driver is looking at during driving. A convolutional neural network is trained to categorize driver's gaze zone from a given face detected image using a multi-GPU platform, and then its network parameters are transferred to a GPU within a PC running on Windows to operate in the real-time basis. Result suggests that the correct rate of gaze zone categorization reaches to 95% in average, indicating that our system outperforms the state-of-art gaze zone categorization methods based on conventional computer vision techniques.

61 citations

Book ChapterDOI
20 Nov 2016
TL;DR: An approach based on recent machine learning techniques is proposed: first, 3D convolutional neural network to extract features in spatial-temporal domain; secondly, gradient boosting for drowsiness classification; thirdly, semi-supervised learning to enhance overall performance.
Abstract: Detecting drowsiness of the driver with a reliable and confident manner is a challenging task since it requires accurate monitoring of facial behavior such as eye-closure, nodding and yawning. It is even harder to deal with it when she wears sunglasses or scarf, appearing in the data set given for this challenge. One of the popular ways to analyze facial behavior has been using standard face models such as active shape model or active appearance model. These models work well for the frontal faces and yet often stumble for the extreme head pose cases. To handle these issues, we propose an approach based on recent machine learning techniques: first, 3D convolutional neural network to extract features in spatial-temporal domain; secondly, gradient boosting for drowsiness classification; thirdly, semi-supervised learning to enhance overall performance. The highest score from our submissions was 87.46% accuracy, suggesting that this approach has a potential for real application.

61 citations

Proceedings ArticleDOI
17 Feb 2014
TL;DR: A system that uses gaze direction tracking and head pose estimation to detect drowsiness of a driver and trace the center point of the pupil using CDF analysis and estimate the frequency of eye-movement is proposed.
Abstract: This paper proposes a system that uses gaze direction tracking and head pose estimation to detect drowsiness of a driver. Head pose is estimated by calculating optic flow of the facial features, which are acquired with a corner detection algorithm. Analysis of the driver's head behavior leads to three moving components: nodding, shaking, and tilting. To track the gaze direction of the driver, we trace the center point of the pupil using CDF analysis and estimate the frequency of eye-movement.

41 citations

Proceedings ArticleDOI
01 Jun 2019
TL;DR: This paper reviews the second NTIRE challenge on image dehazing (restoration of rich details in hazy image) with focus on proposed solutions and results and gauge the state-of-the-art in imageDehazing.
Abstract: This paper reviews the second NTIRE challenge on image dehazing (restoration of rich details in hazy image) with focus on proposed solutions and results. The training data consists from 55 hazy images (with dense haze generated in an indoor or outdoor environment) and their corresponding ground truth (haze-free) images of the same scene. The dense haze has been produced using a professional haze/fog generator that imitates the real conditions of haze scenes. The evaluation consists from the comparison of the dehazed images with the ground truth images. The dehazing process was learnable through provided pairs of haze-free and hazy train images. There were ~ 270 registered participants and 23 teams competed in the final testing phase. They gauge the state-of-the-art in image dehazing.

34 citations

Journal ArticleDOI
TL;DR: This study presents a new method to track driver’s facial states, such as head pose and eye-blinking in the real-time basis, and suggests that it can be used as a driver drowsiness detector in the commercial car where the visual conditions are very diverse and often tough to deal with.
Abstract: This study presents a new method to track driver’s facial states, such as head pose and eye-blinking in the real-time basis. Since a driver in the natural driving condition moves his head in diverse ways and his face is often occluded by his hand or the wheel, it should be a great challenge for the standard face models. Among many, Active Appearance Model (AAM), and Active Shape Model (ASM) are two favored face models. We have extended Discriminative Bayesian ASM by incorporating the extreme pose cases, called it Pose Extended—Active Shape model (PE-ASM). Two face databases (DB) are used for the comparison purpose: one is the Boston University face DB and the other is our custom-made driving DB. Our evaluation indicates that PE-ASM outperforms ASM and AAM in terms of the face fitting against extreme poses. Using this model, we can estimate the driver’s head pose, as well as eye-blinking, by adding respective processes. Two HMMs are trained to model temporal behaviors of these two facial features, and consequently the system can make inference by enumerating these HMM states whether the driver is drowsy or not. Result suggests that it can be used as a driver drowsiness detector in the commercial car where the visual conditions are very diverse and often tough to deal with.

32 citations


Cited by
More filters
Posted Content
TL;DR: The superiority of the proposed HRNet in a wide range of applications, including human pose estimation, semantic segmentation, and object detection, is shown, suggesting that the HRNet is a stronger backbone for computer vision problems.
Abstract: High-resolution representations are essential for position-sensitive vision problems, such as human pose estimation, semantic segmentation, and object detection. Existing state-of-the-art frameworks first encode the input image as a low-resolution representation through a subnetwork that is formed by connecting high-to-low resolution convolutions \emph{in series} (e.g., ResNet, VGGNet), and then recover the high-resolution representation from the encoded low-resolution representation. Instead, our proposed network, named as High-Resolution Network (HRNet), maintains high-resolution representations through the whole process. There are two key characteristics: (i) Connect the high-to-low resolution convolution streams \emph{in parallel}; (ii) Repeatedly exchange the information across resolutions. The benefit is that the resulting representation is semantically richer and spatially more precise. We show the superiority of the proposed HRNet in a wide range of applications, including human pose estimation, semantic segmentation, and object detection, suggesting that the HRNet is a stronger backbone for computer vision problems. All the codes are available at~{\url{this https URL}}.

1,278 citations

Journal ArticleDOI
TL;DR: The High-Resolution Network (HRNet) as mentioned in this paper maintains high-resolution representations through the whole process by connecting the high-to-low resolution convolution streams in parallel and repeatedly exchanging the information across resolutions.
Abstract: High-resolution representations are essential for position-sensitive vision problems, such as human pose estimation, semantic segmentation, and object detection. Existing state-of-the-art frameworks first encode the input image as a low-resolution representation through a subnetwork that is formed by connecting high-to-low resolution convolutions in series (e.g., ResNet, VGGNet), and then recover the high-resolution representation from the encoded low-resolution representation. Instead, our proposed network, named as High-Resolution Network (HRNet), maintains high-resolution representations through the whole process. There are two key characteristics: (i) Connect the high-to-low resolution convolution streams in parallel and (ii) repeatedly exchange the information across resolutions. The benefit is that the resulting representation is semantically richer and spatially more precise. We show the superiority of the proposed HRNet in a wide range of applications, including human pose estimation, semantic segmentation, and object detection, suggesting that the HRNet is a stronger backbone for computer vision problems. All the codes are available at https://github.com/HRNet .

1,162 citations

Journal ArticleDOI
TL;DR: McNeill as discussed by the authors discusses what Gestures reveal about Thought in Hand and Mind: What Gestures Reveal about Thought. Chicago and London: University of Chicago Press, 1992. 416 pp.
Abstract: Hand and Mind: What Gestures Reveal about Thought. David McNeill. Chicago and London: University of Chicago Press, 1992. 416 pp.

988 citations

Journal ArticleDOI
TL;DR: A comprehensive survey on deep facial expression recognition (FER) can be found in this article, including datasets and algorithms that provide insights into the intrinsic problems of deep FER, including overfitting caused by lack of sufficient training data and expression-unrelated variations, such as illumination, head pose and identity bias.
Abstract: With the transition of facial expression recognition (FER) from laboratory-controlled to challenging in-the-wild conditions and the recent success of deep learning techniques in various fields, deep neural networks have increasingly been leveraged to learn discriminative representations for automatic FER. Recent deep FER systems generally focus on two important issues: overfitting caused by a lack of sufficient training data and expression-unrelated variations, such as illumination, head pose and identity bias. In this paper, we provide a comprehensive survey on deep FER, including datasets and algorithms that provide insights into these intrinsic problems. First, we describe the standard pipeline of a deep FER system with the related background knowledge and suggestions of applicable implementations for each stage. We then introduce the available datasets that are widely used in the literature and provide accepted data selection and evaluation principles for these datasets. For the state of the art in deep FER, we review existing novel deep neural networks and related training strategies that are designed for FER based on both static images and dynamic image sequences, and discuss their advantages and limitations. Competitive performances on widely used benchmarks are also summarized in this section. We then extend our survey to additional related issues and application scenarios. Finally, we review the remaining challenges and corresponding opportunities in this field as well as future directions for the design of robust deep FER systems.

712 citations