scispace - formally typeset
Search or ask a question
Author

Du Yihan

Bio: Du Yihan is an academic researcher from Xiamen University. The author has contributed to research in topics: Video tracking & Overfitting. The author has an hindex of 2, co-authored 2 publications receiving 10 citations.

Papers
More filters
Proceedings ArticleDOI
29 Nov 2018
TL;DR: This paper proposes a novel object-adaptive LSTM network, which can effectively exploit sequence dependencies and dynamically adapt to the temporal object variations via constructing an intrinsic model for object appearance and motion and develops an efficient strategy for proposal selection.
Abstract: Convolutional Neural Networks (CNNs) have shown outstanding performance in visual object tracking. However, most of classification-based tracking methods using CNNs are time-consuming due to expensive computation of complex online fine-tuning and massive feature extractions. Besides, these methods suffer from the problem of over-fitting since the training and testing stages of CNN models are based on the videos from the same domain. Recently, matching-based tracking methods (such as Siamese networks) have shown remarkable speed superiority, while they cannot well address target appearance variations and complex scenes for inherent lack of online adaptability and background information. In this paper, we propose a novel object-adaptive LSTM network, which can effectively exploit sequence dependencies and dynamically adapt to the temporal object variations via constructing an intrinsic model for object appearance and motion. In addition, we develop an efficient strategy for proposal selection, where the densely sampled proposals are firstly pre-evaluated using the fast matching-based method and then the well-selected high-quality proposals are fed to the sequence-specific learning LSTM network. This strategy enables our method to adaptively track an arbitrary object and operate faster than conventional CNN-based classification tracking methods. To the best of our knowledge, this is the first work to apply an LSTM network for classification in visual object tracking. Experimental results on OTB and TC-128 benchmarks show that the proposed method achieves state-of-the-art performance, which exhibits great potentials of recurrent structures for visual object tracking.

6 citations

Patent
11 Sep 2018
TL;DR: In this article, a target tracking method based on a long short-term memory network was proposed, which consists of three steps: pre-estimation of a candidate target state by employing a fast matching method, screening out high-quality candidate target states, and performing classification of the high quality candidate target target states by employing an LSTM network.
Abstract: The present invention provides a target tracking method based on a long short-term memory network, and relates to the computer vision technology. The method comprises the steps of: performing pre-estimation of a candidate target state by employing a fast matching method based on similarity learning, screening out high-quality candidate target states, and performing classification of the high-quality candidate target states by employing a long short-term memory network. The long short-term memory network comprises a convolutional layer used for feature extraction and a long short-term memory layer used for classification. The convolutional layer is obtained through offline training on a large-scale image data set ILSVRC15 to avoid a risk of overfitting of the target tracking data set. The long short-term memory layer is obtained through online learning and fully employs the time correlation included by an input video sequence so as to have good capacities on adaption of target forms andmotion change. The speed is observably improved, the long short-term memory network capable of adapting target change is utilized to target tracking.

5 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: Zhang et al. as discussed by the authors proposed an object-adaptive LSTM network to effectively capture the video sequential dependencies and adaptively learn the object appearance variations for real-time visual tracking.

13 citations

Journal ArticleDOI
06 Jul 2020-Sensors
TL;DR: This paper proposes an Input-Regularized Channel Attentional Siamese (IRCA-Siam) tracker which exhibits improved generalization compared to the current state-of-the-art trackers and proposes feature fusion from noisy and clean input channels which improves the target localization.
Abstract: CNN-based trackers, especially those based on Siamese networks, have recently attracted considerable attention because of their relatively good performance and low computational cost For many Siamese trackers, learning a generic object model from a large-scale dataset is still a challenging task In the current study, we introduce input noise as regularization in the training data to improve generalization of the learned model We propose an Input-Regularized Channel Attentional Siamese (IRCA-Siam) tracker which exhibits improved generalization compared to the current state-of-the-art trackers In particular, we exploit offline learning by introducing additive noise for input data augmentation to mitigate the overfitting problem We propose feature fusion from noisy and clean input channels which improves the target localization Channel attention integrated with our framework helps finding more useful target features resulting in further performance improvement Our proposed IRCA-Siam enhances the discrimination of the tracker/background and improves fault tolerance and generalization An extensive experimental evaluation on six benchmark datasets including OTB2013, OTB2015, TC128, UAV123, VOT2016 and VOT2017 demonstrate superior performance of the proposed IRCA-Siam tracker compared to the 30 existing state-of-the-art trackers

7 citations

Patent
10 Sep 2019
TL;DR: In this paper, a trajectory tracking control method consisting of the steps of: obtaining first state information of the unmanned vehicle, wherein the first states information comprises indicating a position deviation amount of the UAV and a desired trajectory at a first moment, inputting the first information to a long-term and short-term memory neural network, and obtaining a first control amount output by the longterm and Short-Term Memory neural network; and evaluating the first control amounts according to multiple pieces of the first state Information predicted within a predicted period of time, and when the evaluation result is favorable,
Abstract: The invention discloses a trajectory tracking control method and device, and an unmanned vehicle. The trajectory tracking control method comprises the steps of: obtaining first state information of the unmanned vehicle, wherein the first state information comprises indicating a position deviation amount of the unmanned vehicle and a desired trajectory at a first moment; inputting the first state information to a long-term and short-term memory neural network, and obtaining a first control amount output by the long-term and short-term memory neural network; and evaluating the first control amount according to multiple pieces of the first state information predicted within a predicted period of time, and when the evaluation result is favorable, controlling the unmanned vehicle to perform anaction according to the first control amount to implement trajectory tracking. The control amount output by the long-term and short-term memory neural network is employed, and the result output by thelong-term and short-term memory neural network is predicted and evaluated to avoid dangerous actions and improve the safety, robustness and stability of trajectory tracking control.

5 citations

Patent
09 Jul 2019
TL;DR: In this paper, a dynamic sign language semantic recognition system and method based on a depth image was proposed, where the sign language actions are translated into characters, a machine or an operating system is controlled, and hearing impairment people can be better integrated into social life.
Abstract: The invention provides a dynamic sign language semantic recognition system and method based on a depth image. According to the system and the method, depth image video information of an operator is acquired; the video information is processed to obtain hand joint information, sign language words are analyzed through the joint information; inputting each word into a semantic analysis model; and judging whether the semantic expression is complete or not, and directly outputting or converting control commands to other control units when the intention expression is complete, so that the sign language actions are translated into characters, a machine or an operating system is controlled, and hearing impairment people can be better integrated into social life.

4 citations

Journal ArticleDOI
TL;DR: It is observed that both models of the prior information lead to performance enhancement of all three trackers, validates the hypothesis that when training videos are available, prior information embodied in the motion models can improve the tracking performance.

1 citations