Home
/
Authors
/
Hemerson Tacon

Author

Hemerson Tacon

Bio: Hemerson Tacon is an academic researcher from Universidade Federal de Juiz de Fora. The author has contributed to research in topics: Convolutional neural network & Deep learning. The author has an hindex of 3, co-authored 7 publications receiving 24 citations.

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Multi-stream Convolutional Neural Networks for Action Recognition in Video Sequences Based on Adaptive Visual Rhythms

[...]

Darwin Ttito Concha¹, Helena de Almeida Maia¹, Helio Pedrini¹, Hemerson Tacon², André de Souza Brito², Hugo de Lima Chaves², Marcelo Bernardes Vieira² - Show less +3 more•Institutions (2)

State University of Campinas¹, Universidade Federal de Juiz de Fora²

01 Dec 2018

TL;DR: A multi-stream network is the architecture of choice to incorporate temporal information, since it may benefit from pre-trained deep networks for images and from handcrafted features for initialization, and its training cost is usually lower than video-based networks.

...read moreread less

Abstract: Advances in digital technology have increased event recognition capabilities through the development of devices with high resolution, small physical dimensions and high sampling rates. The recognition of complex events in videos has several relevant applications, particularly due to the large availability of digital cameras in environments such as airports, banks, roads, among others. The large amount of data produced is the ideal scenario for the development of automatic methods based on deep learning. Despite the significant progress achieved through image-based deep networks, video understanding still faces challenges in modeling spatio-temporal relations. In this work, we address the problem of human action recognition in videos. A multi-stream network is our architecture of choice to incorporate temporal information, since it may benefit from pre-trained deep networks for images and from handcrafted features for initialization. Furthermore, its training cost is usually lower than video-based networks. We explore visual rhythm images since they encode longer-term information when compared to still frames and optical flow. We propose a novel method based on point tracking for deciding the best visual rhythm direction for each video. Experiments conducted on the challenging UCF101 and HMDB51 data sets indicate that our proposed stream improves network performance, achieving accuracy rates comparable to the state-of-the-art approaches.

...read moreread less

21 citations

Book Chapter•DOI•

Human action recognition using convolutional neural networks with symmetric time extension of visual rhythms

[...]

Hemerson Tacon¹, André de Souza Brito¹, Hugo de Lima Chaves¹, Marcelo Bernardes Vieira¹, Saulo Moraes Villela¹, Helena de Almeida Maia², Darwin Ttito Concha², Helio Pedrini² - Show less +4 more•Institutions (2)

Universidade Federal de Juiz de Fora¹, State University of Campinas²

01 Jul 2019

TL;DR: This work proposes the usage of multiple Visual Rhythm crops, symmetrically extended in time and separated by a fixed stride, which provide a 2D representation of the video volume matching the fixed input size of the 2D Convolutional Neural Network employed.

...read moreread less

Abstract: Despite the expressive progress of deep learning models on the image classification task, they still need enhancement for efficient human action recognition. One way to achieve such gain is to augment the existing datasets. With this goal, we propose the usage of multiple Visual Rhythm crops, symmetrically extended in time and separated by a fixed stride. The symmetric extension preserves the video frame rate, which is crucial to not distort actions. The crops provide a 2D representation of the video volume matching the fixed input size of the 2D Convolutional Neural Network (CNN) employed. In addition, multiple crops with stride guarantee coverage of the entire video. Aiming to evaluate our method, a multi-stream strategy combining RGB and Optical Flow information is extended to include the Visual Rhythm. Accuracy rates fairly close to the state-of-the-art were obtained from the experiments with our method on the challenging UCF101 and HMDB51 datasets.

...read moreread less

6 citations

Journal Article•DOI•

Weighted voting of multi-stream convolutional neural networks for video-based action recognition using optical flow rhythms

[...]

André de Souza Brito¹, Marcelo Bernardes Vieira¹, Saulo Moraes Villela¹, Hemerson Tacon¹, Hugo de Lima Chaves¹, Helena de Almeida Maia², Darwin Ttito Concha², Helio Pedrini² - Show less +4 more•Institutions (2)

Universidade Federal de Juiz de Fora¹, State University of Campinas²

01 May 2021-Journal of Visual Communication and Image Representation

TL;DR: A multi-stream architecture based on the weighted voting of convolutional neural networks to deal with the problem of recognizing human actions in videos is proposed, with a new stream, Optical Flow Rhythm, besides using other streams for diversity.

...read moreread less

4 citations

Book Chapter•DOI•

Action Recognition in Videos Using Multi-stream Convolutional Neural Networks

[...]

Helena de Almeida Maia¹, Darwin Ttito Concha¹, Helio Pedrini¹, Hemerson Tacon², André de Souza Brito², Hugo de Lima Chaves², Marcelo Bernardes Vieira², Saulo Moraes Villela² - Show less +4 more•Institutions (2)

State University of Campinas¹, Universidade Federal de Juiz de Fora²

01 Jan 2020

TL;DR: A different pre-training procedure for the latter stream is developed using visual rhythm images extracted from a large and challenging video dataset, the Kinetics, which aims to classify trimmed videos based on the action being performed by one or more agents.

...read moreread less

Abstract: Human action recognition aims to classify trimmed videos based on the action being performed by one or more agents. It can be applied to a large variety of tasks, such as surveillance systems, intelligent homes, health monitoring, and human-computer interaction. Despite the significant progress achieved through image-based deep networks, video understanding still faces challenges in modeling spatiotemporal relations. The inclusion of temporal information in the network may lead to significant growth in the training cost. To address this issue, we explore complementary handcrafted features to feed pre-trained two-dimensional (2D) networks in a multi-stream fashion. In addition to the commonly used RGB and optical flow streams, we propose the use of a stream based on visual rhythm images that encode long-term information. Previous works have shown that either RGB or optical flow streams may benefit from pre-training on ImageNet since they maintain a certain level of object shape. The visual rhythm, on the other hand, harshly deforms the silhouettes of the actors and objects. Therefore, we develop a different pre-training procedure for the latter stream using visual rhythm images extracted from a large and challenging video dataset, the Kinetics.

...read moreread less

3 citations

Proceedings Article•DOI•

Learnable Visual Rhythms Based on the Stacking of Convolutional Neural Networks for Action Recognition

[...]

Helena de Almeida Maia¹, Marcos Roberto e Souza¹, Anderson Carlos Sousa e Santos¹, Helio Pedrini¹, Hemerson Tacon², André de Souza Brito², Hugo de Lima Chaves², Marcelo Bernardes Vieira², Saulo Moraes Villela² - Show less +5 more•Institutions (2)

State University of Campinas¹, Universidade Federal de Juiz de Fora²

01 Dec 2019

TL;DR: This work addresses the problem of human action recognition in videos through a multi-stream network that incorporates both spatial and temporal information, and employs a deep network to extract features from the video frames in order to generate the rhythm.

...read moreread less

Abstract: Recent deep learning techniques have achieved satisfactory results for various image-related problems. However, many research questions remain open in tasks involving video sequences. Several applications demand the understanding of complex events in videos, such as traffic monitoring, person re-identification, security and surveillance. In this work, we address the problem of human action recognition in videos through a multi-stream network that incorporates both spatial and temporal information. The main contribution of our work is a stream based on a new variant of the visual rhythm, called Learnable Visual Rhythm (LVR). We employ a deep network to extract features from the video frames in order to generate the rhythm. The features are collected at multiple depths of the network to enable the analysis of different abstraction levels. This strategy significantly outperforms the handcrafted version on the UCF101 and HMDB51 datasets. Experiments conducted on these datasets show that our final multi-stream network achieved competitive results compared to state-of-the-art approaches.

...read moreread less

2 citations

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Human Action Recognition Based on Transfer Learning Approach

[...]

Yousry M. AbdulAzeem, Hossam Magdy Balaha¹, Waleed M. Bahgat¹, Mahmoud Mohammed Badawy¹•Institutions (1)

Mansoura University¹

04 Jun 2021-IEEE Access

TL;DR: Wang et al. as mentioned in this paper proposed a framework with three main phases for human action recognition, i.e., pre-training, preprocessing, and recognition, which achieved state-of-the-art performance.

...read moreread less

Abstract: Human action recognition techniques have gained significant attention among next-generation technologies due to their specific features and high capability to inspect video sequences to understand human actions. As a result, many fields have benefited from human action recognition techniques. Deep learning techniques played a primary role in many approaches to human action recognition. The new era of learning is spreading by transfer learning. Accordingly, this study’s main objective is to propose a framework with three main phases for human action recognition. The phases are pre-training, preprocessing, and recognition. This framework presents a set of novel techniques that are three-fold as follows, (i) in the pre-training phase, a standard convolutional neural network is trained on a generic dataset to adjust weights; (ii) to perform the recognition process, this pre-trained model is then applied to the target dataset; and (iii) the recognition phase exploits convolutional neural network and long short-term memory to apply five different architectures. Three architectures are stand-alone and single-stream, while the other two are combinations between the first three in two-stream style. Experimental results show that the first three architectures recorded accuracies of 83.24%, 90.72%, and 90.85%, respectively. The last two architectures achieved accuracies of 93.48% and 94.87%, respectively. Moreover, The recorded results outperform other state-of-the-art models in the same field.

...read moreread less

24 citations

Proceedings Article•DOI•

Multi-Stream Deep Convolutional Network Using High-Level Features Applied to Fall Detection in Video Sequences

[...]

Sarah Almeida Cameiro¹, Gabriel Pellegrino da Silva¹, Guilherme Vieira Leite¹, Ricardo Moreno, Silvio Jamil Ferzoli Guimarães², Helio Pedrini¹ - Show less +2 more•Institutions (2)

State University of Campinas¹, Pontifícia Universidade Católica de Minas Gerais²

05 Jun 2019

TL;DR: This work proposes and evaluates a multi-stream learning model based on convolutional neural networks using high-level handcrafted features as input in order to cope withporadic falls and shows that this approach outperforms, in terms of accuracy and sensitivity rates, to other similar tested methods found in literature.

...read moreread less

Abstract: Sporadic falls, due to the lack of balance and other factors, are some of the complications that elderly people might experience more frequently than others. Accordingly, as there is a high probability of these events causing major health casualties, such as bone breaking or head clots, studies have been monitoring these falls to rapidly assist the victim. In this work, we propose and evaluate a multi-stream learning model based on convolutional neural networks using high-level handcrafted features as input in order to cope with this situation. Therefore, our approach consists of extracting high-level handcrafted features, for instance, human pose estimation and optical flow, and using each one as an input for a distinct VGG-16 classifier. In addition, these experiments are able to showcase what features can be used in fall detection. The results have shown that by assembling our directed input learners, our approach outperforms, in terms of accuracy and sensitivity rates, to other similar tested methods found in literature.

...read moreread less

20 citations

Proceedings Article•DOI•

STIP-GCN: Space-time interest points graph convolutional network for action recognition

[...]

18 Jul 2022

TL;DR: Zhang et al. as mentioned in this paper proposed a graph-based framework for action recognition to model the spatio-temporal interactions among the entities in a video without any object-level supervision.

...read moreread less

Abstract: Action recognition requires modelling the interactions between either human & human or human & objects. Re-cently, graph convolutional neural networks (GCNs) are exploited to effectively capture the structure of action by modelling the relationship among entities present in a video. However, most of the approaches depend on the effectiveness of object detection frameworks to detect the entities. In this paper, we propose a graph-based framework for action recognition to model the spatio-temporal interactions among the entities in a video without any object-level supervision. First, we obtain the salient space-time interest points (STIP) that contain rich information about the significant local variations in space and time by using the Harris 3D detector. In order to incorporate the local appearance and motion information of the entities, either low-level or deep features are extracted around these STIPs. Next, we build a graph by considering the extracted STIPs as nodes and are connected by spatial edges and temporal edges. These edges are determined based on a membership function that measures the similarity of entities associated with the STIPs. Finally, GCN is employed on the given graph to provide reasoning among different entities present in a video. We evaluate our method on three widely used datasets, namely, UCF-101, HMDB-51, SSV2 to demonstrate the efficacy of the proposed approach.

...read moreread less

15 citations

Proceedings Article•DOI•

STIP-GCN: Space-time interest points graph convolutional network for action recognition

[...]

Sravani Yenduri, Vishnu Chalavadi, C. Krishna Mohan

18 Jul 2022

TL;DR: A graph-based framework for action recognition to model the spatio-temporal interactions among the entities in a video without any object-level supervision is proposed and evaluated on three widely used datasets.

...read moreread less

15 citations

Journal Article•DOI•

YOLO V3 + VGG16-based automatic operations monitoring and analysis in a manufacturing workshop under Industry 4.0

[...]

Jihong Yan, Zipeng Wang

01 Apr 2022-Journal of Manufacturing Systems

TL;DR: Wang et al. as mentioned in this paper proposed the YOLO V3 + VGG 16 transfer learning network to realize the automatic recognition, monitoring and analysis of small sample data, the recognition accuracy of the proposed method is greater than 96%, and the average deviation of the action execution time is less than 1 s.

...read moreread less

12 citations

1
2
3
4
…
5
6

Collapse