scispace - formally typeset
Search or ask a question
Author

Thanh-Hai Tran

Bio: Thanh-Hai Tran is an academic researcher from Hanoi University of Science and Technology. The author has contributed to research in topics: Computer science & Gesture. The author has an hindex of 10, co-authored 59 publications receiving 305 citations. Previous affiliations of Thanh-Hai Tran include Huazhong University of Science and Technology.

Papers published on a yearly basis

Papers
More filters
Journal ArticleDOI
TL;DR: An assistive system for visually impaired people based on the matrix of electrode and a mobile Kinect that tries to represent obstacle’s information under the form of electrode matrix.
Abstract: Obstacle detection and warning can improve the mobility as well as the safety of visually impaired people specially in unfamiliar environments. For this, firstly, obstacles are detected and localized and then the information of the obstacles will be sent to the visually impaired people by using different modalities such as voice, tactile, vibration. In this paper, we present an assistive system for visually impaired people based on the matrix of electrode and a mobile Kinect. This system consists of two main components: environment information acquisition and analysis and information representation. The first component aims at capturing the environment by using a mobile Kinect and analyzing it in order to detect the predefined obstacles for visually impaired people, while the second component tries to represent obstacle's information under the form of electrode matrix.

45 citations

Journal ArticleDOI
TL;DR: The proposed method using multi-modal features obtained higher results than using unimodal features and shows the potential to be applied in to any of living space in reality.

40 citations

Journal ArticleDOI
TL;DR: A method for recognizing human activity from wearable sensors based on a capsule network named SensCapsNet is proposed and a life logging application is developed which achieves a real-time computation and the accuracy rate greater than 80% for 5 common upper body activities.
Abstract: Recently, the recent advancement of deep learning with the capacity to perform automatic high-level feature extraction has achieved promising performance for sensor-based human activity recognition (HAR). Among different deep learning methods, Convolutional Neural Network (CNN) and Long Short Term Memory (LSTM) have been widely adopted. However, scalar outputs and pooling in CNN only allow to get the invariance but not the equivariance. The capsule networks (CapsNet) with the vector output and routing by agreement is able to capture the equivariance. In this paper, we propose a method for recognizing human activity from wearable sensors based on a capsule network named SensCapsNet. The architecture of SensCapsNet is designed to be suitable for spatial-temporal data coming from wearable sensors. Experimental results show that the proposed network outperforms CNN and LSTM methods. The performance of the proposed CapsNet architecture is assessed by altering dynamic routing between capsule layers. The proposed SensCapsNet yields improved accuracy values of 77.7% and 70.5% for 1 routing on two testing datasets in comparison with the baseline methods based on CNN and LSTM that yields the F1-score of 67.7% and 69.2% for the first dataset and 65.3% and 67.6% for the second dataset respectively. Moreover, even several human activity datasets are available, privacy invasion and obtrusive concerns have not been carefully taken in to consideration in dataset building. Toward to build a non-obstructive sensing based human activity recognition method, in this paper, a dataset named 19NonSens is designed and collected from twelve subjects wearing e-Shoes and a smart watch to perform 19 activities under multiple contexts. This dataset will be made publicity available. Finally, thanks to the promising results obtained by the proposed method, we develop a life logging application which achieves a real-time computation and the accuracy rate greater than 80% for 5 common upper body activities.

37 citations

Proceedings ArticleDOI
01 Aug 2018
TL;DR: A large continuous multimodal multivew dataset of human fall detection, namely CMDFALL, is introduced and the role of each modality is investigated to get the best results in the context of human activity recognition.
Abstract: Over the last decade, a large number of methods have been proposed for human fall detection. Most existing methods were evaluated based on trimmed datasets. More importantly, these datasets lack variety of falls, subjects, views and modalities. This paper makes two contributions in the topic of automatic human fall detection. Firstly, to address the above issues, we introduce a large continuous multimodal multivew dataset of human fall, namely CMDFALL. Our CMDFALL dataset was built by capturing activities from 50 subjects, with seven overlapped Kinect sensors and two wearable accelerometers. Each subject performs 20 activities including 8 falls of different styles and 12 daily activities. All multi-modal multi-view data (RGB, depth, skeleton, acceleration) are time-synchronized and annotated for evaluating performance of recognition algorithms of human activities or human fall in indoor environment. Secondly, based on the multimodal property of the dataset, we investigate the role of each modality to get the best results in the context of human activity recognition. To this end, we adopt existing baseline techniques which have been shown to be very efficient for each data modality such as C3D convnet on RGB; DMM-KDES on depth; Res-TCN on skeleton and 2D convnet on acceleration data. We analyze to show which modalities and their combination give the best performance.

31 citations

Journal ArticleDOI
TL;DR: The proposed way-finding system deployed on a mobile robot to assist visual impairments (VI) deploys state-of-the-art techniques that are adapted to the practical issues at hand and shows that VI people can find the right way to requested targets.
Abstract: A way-finding system in an indoor environment consists of several components: localization, representation, path planning, and interaction. For each component, numerous relevant techniques have been proposed. However, deploying feasible techniques, particularly in real scenarios, remains challenging. In this paper, we describe a functional way-finding system deployed on a mobile robot to assist visual impairments (VI). The proposed system deploys state-of-the-art techniques that are adapted to the practical issues at hand. First, we adapt an outdoor visual odometry technique to indoor use by covering manual markers or stickers on ground-planes. The main purpose is to build reliable travel routes in the environment. Second, we propose a procedure to define and optimize the landmark/representative scenes of the environment. This technique handles the repetitive and ambiguous structures of the environment. In order to interact with VI people, we deploy a convenient interface on a smart phone. Three different indoor scenarios and thirteen subjects are conducted in our evaluations. Our experimental results show that VI people, particularly VI pupils, can find the right way to requested targets.

26 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: Different levels of an intelligent video surveillance system (IVVS) are studied in this paper, where techniques related to feature extraction and description for behavior representation are reviewed, and available datasets and metrics for performance evaluation are presented.
Abstract: Different levels of an intelligent video surveillance system (IVVS) are studied in this review.Existing approaches for abnormal behavior recognition relative to each level of an IVVS are extensively reviewed.Challenging datasets for IVVS evaluation are presented.Limitations of the abnormal behavior recognition area are discussed. With the increasing number of surveillance cameras in both indoor and outdoor locations, there is a grown demand for an intelligent system that detects abnormal events. Although human action recognition is a highly reached topic in computer vision, abnormal behavior detection is lately attracting more research attention. Indeed, several systems are proposed in order to ensure human safety. In this paper, we are interested in the study of the two main steps composing a video surveillance system which are the behavior representation and the behavior modeling. Techniques related to feature extraction and description for behavior representation are reviewed. Classification methods and frameworks for behavior modeling are also provided. Moreover, available datasets and metrics for performance evaluation are presented. Finally, examples of existing video surveillance systems used in real world are described.

243 citations

Journal ArticleDOI
TL;DR: In this article, the authors summarized available data sets and relevant studies on recent developments in point cloud semantic segmentation and point cloud segmentation (PCS) for 3D point clouds.
Abstract: Ripe with possibilities offered by deep-learning techniques and useful in applications related to remote sensing, computer vision, and robotics, 3D point cloud semantic segmentation (PCSS) and point cloud segmentation (PCS) are attracting increasing interest. This article summarizes available data sets and relevant studies on recent developments in PCSS and PCS.

205 citations

Journal ArticleDOI
TL;DR: An original "task oriented" way to categorize the state of the art of the AT works has been introduced that relies on the split of the final assistive goals into tasks that are then used as pointers to the works in literature in which each of them has been used as a component.

183 citations

Journal ArticleDOI
TL;DR: This paper presents a low-cost descriptor called 3D histograms of texture (3DHoTs) to extract discriminant features from a sequence of depth maps and adds a new multi-class constraint into the objective function, which helps to maintain a better margin distribution by maximizing the mean of margin, whereas still minimizing the variance of margin.
Abstract: Human action recognition is an important yet challenging task. This paper presents a low-cost descriptor called 3D histograms of texture (3DHoTs) to extract discriminant features from a sequence of depth maps. 3DHoTs are derived from projecting depth frames onto three orthogonal Cartesian planes, i.e., the frontal, side, and top planes, and thus compactly characterize the salient information of a specific action, on which texture features are calculated to represent the action. Besides this fast feature descriptor, a new multi-class boosting classifier (MBC) is also proposed to efficiently exploit different kinds of features in a unified framework for action classification. Compared with the existing boosting frameworks, we add a new multi-class constraint into the objective function, which helps to maintain a better margin distribution by maximizing the mean of margin, whereas still minimizing the variance of margin. Experiments on the MSRAction3D, MSRGesture3D, MSRActivity3D, and UTD-MHAD data sets demonstrate that the proposed system combining 3DHoTs and MBC is superior to the state of the art.

142 citations