scispace - formally typeset
Search or ask a question
Author

Eung-Joo Lee

Bio: Eung-Joo Lee is an academic researcher. The author has contributed to research in topics: Deep learning & Initialization. The author has an hindex of 1, co-authored 1 publications receiving 4 citations.

Papers
More filters
Proceedings ArticleDOI
01 Nov 2019
TL;DR: A new time transition layer that models variable temporal convolution kernel depths is put forward that embeds this new Hybrid Model in the proposed 3D CNN, and the DenseNet architecture is extended with 3D filters and pooling kernels.
Abstract: In recent years, deep Convolutional neural networks(CNNs) have made fantastic progress in static image recognition, but the ability to model motion information on behavioral video is weak. Therefore, our paper put forward a new time transition layer that models variable temporal convolution kernel depths. We embed this new Hybrid Model in our proposed 3D CNN. We extend the DenseNet architecture with 3D filters and pooling kernels. It will take time as training a 3D convolutional neural network requires a large number of tagged data sets to start training from the input. Therefore, the focus of this paper is on simple and effective technique of passing 2D convolutional neural network pre-trained data to a randomly initialized 3D convolutional neural network for stable weight initialization, where we can still achieve our experimental results by appropriately reducing the number of 3D convolutional neural network training samples. Experiments show that the network can make a more accurate classification of behavioral video, identify it in the UCF-101 database, and compare it with other classical algorithms that have appeared in recent years. The results reflect the superiority of the algorithm.

6 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: This paper attempts to design and implement a new two-stream model by using an LSTM-based model in its spatial stream to extract both spatial and temporal features in RGB frames, and implements a DenseNet in the temporal stream to improve the recognition accuracy.
Abstract: This paper addresses the recognitions of human actions in videos Human action recognition can be seen as the automatic labeling of a video according to the actions occurring in it It has become one of the most challenging and attractive problems in the pattern recognition and video classification fields The problem itself is difficult to solve by traditional video processing methods because of several challenges such as the background noise, sizes of subjects in different videos, and the speed of actions Derived from the progress of deep learning methods, several directions are developed to recognize a human action from a video, such as the long-short-term memory (LSTM)-based model, two-stream convolutional neural network (CNN) model, and the convolutional 3D modelIn this paper, we focus on the two-stream structure The traditional two-stream CNN network solves the problem that CNNs do not have satisfactory performance on temporal features By training a temporal stream, which uses the optical flow as the input, a CNN can have the ability to extract temporal features However, the optical flow only contains limited temporal information because it only records the movements of pixels on the x-axis and the y-axis Therefore, we attempt to design and implement a new two-stream model by using an LSTM-based model in its spatial stream to extract both spatial and temporal features in RGB frames In addition, we implement a DenseNet in the temporal stream to improve the recognition accuracy This is in-contrast to traditional approaches which typically utilize the spatial stream for extracting only spatial features The quantitative evaluation and experiments are conducted on the UCF-101 dataset, which is a well-developed public video dataset For the temporal stream, we choose the optical flow of UCF-101 Images in the optical flow are provided by the Graz University of Technology The experimental result shows that the proposed method outperforms the traditional two-stream CNN method with an accuracy of at least 3% For both spatial and temporal streams, the proposed model also achieves higher recognition accuracies In addition, compared with the state of the art methods, the new model can still have the best recognition performance

32 citations

Journal ArticleDOI
TL;DR: In this article, a systematic literature review (SLR) is presented to collect existing research on video-based human activity recognition, summarize, and analyze the state-of-the-art deep learning architectures regarding various methodologies, challenges, and issues.
Abstract: From the past few decades, Human activity recognition (HAR) is one of the vital research areas in computer vision in which much research is ongoing. The researcher’s focus is shifting towards this area due to its vast range of real-life applications to assist in daily living. Therefore, it is necessary to validate its performance on standard benchmark datasets and state-of-the-art systems before applying it in real-life applications. The primary objective of this Systematic Literature Review (SLR) is to collect existing research on video-based human activity recognition, summarize, and analyze the state-of-the-art deep learning architectures regarding various methodologies, challenges, and issues. The top five scientific databases (such as ACM, IEEE, ScienceDirect, SpringerLink, and Taylor & Francis) are accessed to accompany this systematic study by summarizing 70 different research articles on human activity recognition after critical review. Human activity recognition in videos is a challenging problem due to its diverse and complex nature. For accurate video classification, extraction of both spatial and temporal features from video sequences is essential. Therefore, this SLR focuses on reviewing the recent advancements in stratified self-deriving feature-based deep learning architectures. Furthermore, it explores various deep learning techniques available for HAR, challenges researchers to face to build a robust model, and state-of-the-art datasets used for evaluation. This SLR intends to provide a baseline for video-based human activity recognition research while emphasizing several challenges regarding human activity recognition accuracy in video sequences using deep neural architectures.

11 citations

Journal ArticleDOI
TL;DR: In this paper, Zhang et al. used various architectures of convolutional neural networks: multilayer perceptron, 1-and 2-dimension convolution networks, and 3-dimensional CNNs.
Abstract: Coaches and athletes need to understand the kinematics and dynamics of karate kicks to improve the training process and results. The research was aimed at studying the automatic recognition of punches in karate using only linear acceleration sensors. Accelerometers were part of the Inertial Measurement Units (IMUs), which were attached to the left and right wrist of the athlete. To develop a model of punches, highly qualified athletes with 3–7 years of karate experience participated in the research. We analyzed the acceleration fields of various karate punches: Yun Tsuki, Mawashi Tsuki, Age of Tsuki, Uraken. We have proposed more straightforward approach to extracting features without calculating their statistical characteristics. To solve the classification problem, we have used various architectures of convolutional neural networks: multilayer perceptron, 1- and 2-dimension Convolution Networks. Since the recognition of punches was carried out in the conditions of a shadow fight, in addition to the recognition of punches, another output parameter was introduced – movement without punches. Studies have shown a high level of punch recognition based on the developed models. The multi-class accuracy value is 0.96, and the average F1 value is 0.97 for five different punch classes. Thus, the proposed approach is more suitable for practical implementation in automatic learning systems.

1 citations

Journal ArticleDOI
TL;DR: In this article , an iterative random training sampling convolutional neural network (IRTS-CNN) was proposed to improve the performance of hyperspectral image classification (HSIC).
Abstract: Convolution neural network (CNN) has received considerable interest in hyperspectral image classification (HSIC) lately due to its excellent spectral-spatial feature extraction capability. To improve CNN, many approaches have been directed to exploring the infrastructure of its network by introducing different paradigms. This paper takes a rather different approach by developing an iterative CNN which extends a CNN by including a feedback system to repeatedly process the same CNN in an iterative manner. Its idea is to take advantage of a recently developed iterative training sampling spectral-spatial classification (IRTS-SSC) that allows CNN to update its spatial information of classification maps through a feedback spatial filtering system via IRTS. The resulting CNN is called iterative random training sampling CNN (IRTS-CNN) with several unique features. First, IRTS-CNN combines CNN and IRTS-SSC into one paradigm, an architecture which has never investigated in the past. Second, it implements a series of spatial filters to capture spatial information of classified data samples and further feeds this information back via an iterative process to expand the current input data cube for the next iteration. Third, it utilizes the expanded data cube to randomly re-select training samples and then to re-implement CNN iteratively. Last but not least, IRTS-CNN provides a general framework which can implement any arbitrary CNN as an initial classifier to improve its performance through an iterative process. Extensive experiments are conducted to demonstrate that IRTS-CNN indeed significantly improves CNN, specifically, when only a small size of limited training samples is used.

1 citations

Book ChapterDOI
01 Jan 2021
TL;DR: In this article, the authors proposed a theory for emotion detection and then a recommendation of a song to enhance the user's mood by using the features provided by deep learning and image processing.
Abstract: It is said that health is wealth. Here, health refers to both physical health and mental health. People take various measures to take care of their physical health but ignore their mental health which can lead to depression and even diseases like diabetes mellitus and so on. Emotion detection can help us to diagnose our mental health status. Therefore, this paper proposes a theory for emotion detection and then a recommendation of a song to enhance the user’s mood by using the features provided by deep learning and image processing. Here, convolutional neural network-based (CNN) LeNet architecture has been used for emotion detection. The KDEF dataset is used for feeding input to the CNN model and then training it. The model has been trained for detecting the emotion. After training the model, a training accuracy of 98.03% and a validation accuracy of 97.96% have been achieved for correctly recognizing the seven different emotions, that is, sad, disgust, happy, afraid, neutral, angry and surprise through facial expressions.

1 citations