Bio: Pan Wang is an academic researcher from Wuhan University of Technology. The author has contributed to research in topics: Deep learning & Robustness (computer science). The author has an hindex of 1, co-authored 1 publications receiving 7 citations.
••01 Nov 2018
TL;DR: The MDNet (multi-domain Network) deep learning tracking model is combined with the modified Faster R-CNN target detection network, to improve the robustness and accuracy of the multi-target recognition method in the dynamic environment.
Abstract: In recent years, China’s urban monitoring network has developed rapidly, and it has become increasingly intelligent and high-definition. Therefore, computer technologies are needed to solve some problems existed in target tracking currently. The image of urban monitoring is very complex with a wide variety of monitoring objects and a large number of objects, and frequent occlusions will exert between different objects When objects are moving, which can interfere with the accuracy of the recognition algorithm. In this paper, the tracking algorithm based on deep learning is studied intensively, and the MDNet (multi-domain Network) deep learning tracking model is combined with the modified Faster R-CNN target detection network, to improve the robustness and accuracy of the multi-target recognition method in the dynamic environment. The vot2015 data set is selected to test algorithm in this paper. The algorithm is compared with CF2 tracking algorithm which is based on deep learning in various environments, and The results show that the proposed algorithm improves the recognition accuracy and real-time recognition.
TL;DR: Compared with CNNs, the results indicate that for sleep stage classification, the temporal information within the data or the features extracted from the data should be considered, and LSTM networks take this temporal information into account, and thus, may be more suitable forSleep stage classification.
Abstract: Sleep stage classification is an open challenge in the field of sleep research. Considering the relatively small size of datasets used by previous studies, in this paper we used the Sleep Heart Health Study dataset from the National Sleep Research Resource database. A long short-term memory (LSTM) network using a time-frequency spectra of several consecutive 30 s time points as an input was used to perform the sleep stage classification. Four classical convolutional neural networks (CNNs) using a time-frequency spectra of a single 30 s time point as an input were used for comparison. Results showed that, when considering the temporal information within the time-frequency spectrum of a single 30 s time point, the LSTM network had a better classification performance than the CNNs. Moreover, when additional temporal information was taken into consideration, the classification performance of the LSTM network gradually increased. It reached its peak when temporal information from three consecutive 30 s time points was considered, with a classification accuracy of 87.4% and a Cohen’s Kappa coefficient of 0.8216. Compared with CNNs, our results indicate that for sleep stage classification, the temporal information within the data or the features extracted from the data should be considered. LSTM networks take this temporal information into account, and thus, may be more suitable for sleep stage classification.
••25 Nov 2020
TL;DR: Wang et al. as discussed by the authors designed and implemented three algorithms based on convolutional neural networks (CNN), two-stream CNN, CNN+LSTM, and 3D CNN to identify human actions in videos.
Abstract: The goal of human action recognition is to identify and understand the actions of people in videos and export corresponding tags. In addition to spatial correlation existing in 2D images, actions in a video also own the attributes in temporal domain. Due to the complexity of human actions, e.g., the changes of perspectives, background noises, and others will affect the recognition. In order to solve these thorny problems, three algorithms are designed and implemented in this paper. Based on convolutional neural networks (CNN), Two-Stream CNN, CNN+LSTM, and 3D CNN are harnessed to identify human actions in videos. Each algorithm is explicated and analyzed on details. HMDB-51 dataset is applied to test these algorithms and gain the best results. Experimental results showcase that the three methods have effectively identified human actions given a video, the best algorithm thus is selected.
TL;DR: A framework that combines You only look once version 3 (YOLOv3) and a sort algorithm to complete multi-target tracking in the form of tracking by detection is proposed that provides help for the real-time detection of the working condition, which has a strong practical application.
Abstract: The traditional oil well monitoring method relies on manual acquisition and various high-precision sensors. Using the indicator diagram to judge the working condition of the well is not only difficult to establish but also consumes huge manpower and financial resources. This paper proposes the use of computer vision in the detection of working conditions in oil extraction. Combined with the advantages of an unmanned aerial vehicle (UAV), UAV aerial photography images are used to realize real-time detection of on-site working conditions by real-time tracking of the working status of the head working and other related parts of the pumping unit. Considering the real-time performance of working condition detection, this paper proposes a framework that combines You only look once version 3 (YOLOv3) and a sort algorithm to complete multi-target tracking in the form of tracking by detection. The quality of the target detection in the framework is the key factor affecting the tracking effect. The experimental results show that a good detector makes the tracking speed achieve the real-time effect and provides help for the real-time detection of the working condition, which has a strong practical application.
TL;DR: Wang et al. as discussed by the authors explored a pragmatic approach to research the real-time performance of a multiway concurrent multiobject tracking (MOT) system and proposed a new MOT framework to solve multiway concurrency scenario based on a tracking-by-detection (TBD) model.
Abstract: This paper explored a pragmatic approach to research the real-time performance of a multiway concurrent multiobject tracking (MOT) system. At present, most research has focused on the tracking of single-image sequences, but in practical applications, multiway video streams need to be processed in parallel by MOT systems. There have been few studies on the real-time performance of multiway concurrent MOT systems. In this paper, we proposed a new MOT framework to solve multiway concurrency scenario based on a tracking-by-detection (TBD) model. The new framework mainly focuses on concurrency and real-time based on limited computing and storage resources, while considering the algorithm performance. For the former, three aspects were studied: (1) Expanded width and depth of tracking-by-detection model. In terms of width, the MOT system can support the process of multiway video sequence at the same time; in terms of depth, image collectors and bounding box collectors were introduced to support batch processing. (2) Considering the real-time performance and multiway concurrency ability, we proposed one kind of real-time MOT algorithm based on directly driven detection. (3) Optimization of system level—we also utilized the inference optimization features of NVIDIA TensorRT to accelerate the deep neural network (DNN) in the tracking algorithm. To trade off the performance of the algorithm, a negative sample (false detection sample) filter was designed to ensure tracking accuracy. Meanwhile, the factors that affect the system real-time performance and concurrency were studied. The experiment results showed that our method has a good performance in processing multiple concurrent real-time video streams.
TL;DR: A new framework is proposed to combine the digital signage with a depth camera for tracking multi-face in the three-dimensional (3D) environment and extracts the audience's face centroid position and depth information and plots into the aerial map to simulate the crowd's movement that is corresponding to the real-world environment.
Abstract: Digital signage is widely utilized in digital-out-of-home (DOOH) advertising for marketing and business. Recently, the combination of the digital camera and digital signage enables the advertiser to gather the audience demographic for audience measurement. Audience measurement is useful for the advertiser to understand the audience's behavior and improve their business strategies. When an audience is facing the digital display, the vision-based DOOH system will process the audience's face and broadcast a personalized advertisement. Most of the digital signage is available in an uncontrolled environment of public areas. Thus, it poses two main challenges for the vision-based DOOH system to track the audience's movement, which are multiple adjacent faces and occlusion by passer-by. In this paper, a new framework is proposed to combine the digital signage with a depth camera for tracking multi-face in the three-dimensional (3D) environment. The proposed framework extracts the audience's face centroid position (x, y) and depth information (z) and plots into the aerial map to simulate the audience's movement that is corresponding to the real-world environment. The advertiser can further measure the advertising effectiveness through the audience's behavior.