Author
Na Jiang
Bio: Na Jiang is an academic researcher. The author has contributed to research in topics: Computer science & Artificial intelligence. The author has co-authored 4 publications.
Papers
More filters
TL;DR: In this article , the authors used spike neuron networks with a spatio-temporal back-propagation training method, leading to the best recognition accuracy of 77% for the combination of event camera-based sign language gesture recognition and robotic perception.
Abstract: Sign language recognition has been utilized in human–machine interactions, improving the lives of people with speech impairments or who rely on nonverbal instructions. Thanks to its higher temporal resolution, less visual redundancy information and lower energy consumption, the use of an event camera with a new dynamic vision sensor (DVS) shows promise with regard to sign language recognition with robot perception and intelligent control. Although previous work has focused on event camera-based, simple gesture datasets, such as DVS128Gesture, event camera gesture datasets inspired by sign language are critical, which poses a great impediment to the development of event camera-based sign language recognition. An effective method to extract spatio-temporal features from event data is significantly desired. Firstly, the event-based sign language gesture datasets are proposed and the data have two sources: traditional sign language videos to event stream (DVS_Sign_v2e) and DAVIS346 (DVS_Sign). In the present dataset, data are divided into five classification, verbs, quantifiers, position, things and people, adapting to actual scenarios where robots provide instruction or assistance. Sign language classification is demonstrated in spike neuron networks with a spatio-temporal back-propagation training method, leading to the best recognition accuracy of 77%. This work paves the way for the combination of event camera-based sign language gesture recognition and robotic perception for the future intelligent systems.
TL;DR: In this article , an improved event-based self-attention optical flow estimation network (SA-FlowNet) is proposed, which independently uses crisscross and temporal selfattention mechanisms, directly capturing long-range dependencies and efficiently extracting the temporal and spatial features from the event streams.
Abstract: Inspired by biological vision mechanism, event-based cameras have been developed to capture continuous object motion and detect brightness changes independently and asynchronously, which overcome the limitations of traditional frame-based cameras. Complementarily, spiking neural networks (SNNs) offer asynchronous computations and exploit the inherent sparseness of spatio-temporal events. Notably, event-based pixel-wise optical flow estimations calculate the positions and relationships of objects in adjacent frames; however, as event camera outputs are sparse and uneven, dense scene information is difficult to generate and the local receptive fields of the neural network also lead to poor moving objects tracking. To address these issues, an improved event-based self-attention optical flow estimation network (SA-FlowNet) that independently uses criss-cross and temporal self-attention mechanisms, directly capturing long-range dependencies and efficiently extracting the temporal and spatial features from the event streams is proposed. In the former mechanism, a cross-domain attention scheme dynamically fusing the temporal-spatial features is introduced. The proposed network adopts a spiking-analogue neural network architecture using an end-to-end learning method and gains significant computational energy benefits especially for SNNs. The state-of-the-art results of the error rate for optical flow prediction on the Multi-Vehicle Stereo Event Camera (MVSEC) dataset compared with the current SNN-based approaches is demonstrated.
04 Jun 2023
TL;DR: In this article , a low-rank and joint-sparse model is proposed to reduce the amount of sampled channel data of focused beam imaging by considering all the received data as a 2D matrix.
Abstract: Ultrasound plane wave imaging is widely used in many applications thanks to its capability in reaching high frame rates. However, the amount of data acquisition and storage in a period of time can become a bottleneck in ultrasound system design for thousands frames per second. In our previous study, we proposed a low-rank and joint-sparse model to reduce the amount of sampled channel data of focused beam imaging by considering all the received data as a 2D matrix. However, for a single plane wave transmission, the number of channels is limited and the low-rank property of the received data matrix is no longer achieved. In this study, a L 0 -norm based Hankel structured low-rank and sparse model is proposed to reduce the channel data. An optimization algorithm, based on the alternating direction method of multipliers (ADMM), is proposed to efficiently solve the resulting optimization problem. The performance of the proposed approach was evaluated using the data published in Plane Wave Imaging Challenge in Medical Ultrasound (PICMUS) in 2016. Results on channel and plane wave data show that the proposed method is better adapted to the ultrasound channel signal and can recover the image with fewer samples than the conventional CS method.
TL;DR: Wang et al. as mentioned in this paper proposed a double threshold filter with Sigmoid eHarris (DTFS-eHarris) and an asynchronous corner tracker to reduce the redundant or wrong corners.
Abstract: The event camera, a new bio-inspired vision sensor with low latency and high temporal resolution, has brought great potential and demonstrated a promising application in machine vision and artificial intelligence. Corner detection is a key step of object motion estimation and tracking. However, most existing event-based corner detectors, such as G-eHarris and Arc*, lead to a huge number of redundant or wrong corners, and cannot strike a balance between the accuracy and real-time performance, especially in complex scenes with high texture that require higher computational costs. To address these issues, we propose an asynchronous corner detection method: a double threshold filter with Sigmoid eHarris (DTFS-eHarris) and an asynchronous corner tracker. The main contributions are that a double threshold filter is designed to reduce the redundant events and the improved Sigmoid function is utilized to represent the Surface of Active Events (Sigmoid*-SAE). We selected four scenes—shapes, dynamic, poster and boxes—from the public event camera dataset DAVIS240C to compare with the existing state-of-the-art hybrid method; our method has shown more than a 10% reduction in false positive rate and a 5% and 20% improvement in accuracy and throughput, respectively. The evaluations indicate that DTFS-eHarris shows a significant improvement, especially in complex scenes. Thus, it is anticipated to enhance the real-time performance and feature detection accuracy for future robotic applications.