scispace - formally typeset
Search or ask a question
Author

Yuehua Zhao

Bio: Yuehua Zhao is an academic researcher from Hebei University of Technology. The author has contributed to research in topics: Point cloud & Computer science. The author has an hindex of 1, co-authored 2 publications receiving 5 citations.

Papers
More filters
Journal ArticleDOI
04 Mar 2021-Agronomy
TL;DR: The test results show that the EfficientNet-B0-Y OLOv4 model proposed in this paper has better detection performance than YOLOv3, YOLov4, and Faster R-CNN with ResNet, which are state-of-the-art apple detection model.
Abstract: To enable the apple picking robot to quickly and accurately detect apples under the complex background in orchards, we propose an improved You Only Look Once version 4 (YOLOv4) model and data augmentation methods. Firstly, the crawler technology is utilized to collect pertinent apple images from the Internet for labeling. For the problem of insufficient image data caused by the random occlusion between leaves, in addition to traditional data augmentation techniques, a leaf illustration data augmentation method is proposed in this paper to accomplish data augmentation. Secondly, due to the large size and calculation of the YOLOv4 model, the backbone network Cross Stage Partial Darknet53 (CSPDarknet53) of the YOLOv4 model is replaced by EfficientNet, and convolution layer (Conv2D) is added to the three outputs to further adjust and extract the features, which make the model lighter and reduce the computational complexity. Finally, the apple detection experiment is performed on 2670 expanded samples. The test results show that the EfficientNet-B0-YOLOv4 model proposed in this paper has better detection performance than YOLOv3, YOLOv4, and Faster R-CNN with ResNet, which are state-of-the-art apple detection model. The average values of Recall, Precision, and F1 reach 97.43%, 95.52%, and 96.54% respectively, the average detection time per frame of the model is 0.338 s, which proves that the proposed method can be well applied in the vision system of picking robots in the apple industry.

35 citations

Journal ArticleDOI
TL;DR: COPRNet as discussed by the authors combines spatial structure information to encode unique geometric embedding, greatly enhancing the feature perception, and designs a corresponding point matching module, which includes a two-stage point filtering strategy.
Abstract: ABSTRACT Three-dimensional (3D) point cloud registration is a critical topic in 3D computer vision and remote sensing. Several algorithms based on deep learning techniques have recently tried to deal with indoor partial-to-partial point cloud registration by searching the correspondences between input point clouds. However, existing correspondence-based methods are vulnerable to noise and do not adequately exploit geometric information to extract features, resulting in incorrect correspondences. In this work, we develop a novel network using correspondence confidence and overlap scores to address these challenges. Specifically, we first introduce a feature interaction module that combines spatial structure information to encode unique geometric embedding, greatly enhancing the feature perception. Furthermore, we design a corresponding point matching module, which includes a two-stage point filtering strategy. This method effectively improves the ability to identify embedded inliers from outliers and accurately remove spurious matches, thus allowing the network to focus more on the accurate correspondences of overlapping regions. Extensive experiments on different benchmark datasets indicate that our network shows superior performance of indoor point cloud registration, especially in low overlap registration, with significant improvement over state-of-the-art (SOTA) methods. Our code can be found at https://github.com/tranceok/COPRNet.
Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed a model named Spatial Semantic Incorporation Network (SSI-Net) for real-time large-scale point cloud segmentation, which adopts a random sample-based hierarchical network structure.
Abstract: Real-time large-scale point cloud segmentation is an important but challenging task for practical applications such as remote sensing and robotics. Existing real-time methods have achieved acceptable performance by aggregating local information. However, most of them only exploit local spatial geometric or semantic information dependently, few considering the complementarity of both. In this paper, we propose a model named Spatial–Semantic Incorporation Network (SSI-Net) for real-time large-scale point cloud segmentation. A Spatial-Semantic Cross-correction (SSC) module is introduced in SSI-Net as a basic unit. High-quality contextual features can be learned through SSC by correcting and updating high-level semantic information using spatial geometric cues and vice versa. Adopting the plug-and-play SSC module, we design SSI-Net as an encoder–decoder architecture. To ensure efficiency, it also adopts a random sample-based hierarchical network structure. Extensive experiments on several prevalent indoor and outdoor datasets for point cloud semantic segmentation demonstrate that the proposed approach can achieve state-of-the-art performance.
Journal ArticleDOI
TL;DR: In this article , a geometry-guided instance-aware prior and multi-stage reconstruction networks are proposed to estimate category-level 6D pose from RGB point clouds, which can avoid unstable RGB data interference.
Abstract: Category-level object 6D pose estimation is essential for robotic manipulation, augmented reality and 3D scene understanding. It aims to accurately predict the translation and rotation of arbitrary shape instances from a given set of object classes without models of each instance. However, such estimation is cumbersome owing to the intra-class variation and the difficulty of accurately regressing dense correspondences between an observed point cloud and the reconstructed instance model from high-order and complex features. In this study, we propose a framework comprising geometry-guided instance-aware prior and multi-stage reconstruction networks to address these challenges. The geometry-guided instance-aware prior network can avoid unstable RGB data interference and robustly learn implicit relations on semantic and geometric information between the observed point cloud and prior. We take advantage of these potential relations using a transformer network to cope with the intra-class variation. Furthermore, the proposed multi-stage reconstruction network is designed to learn the residual of the preliminary reconstruction and improve the accuracy of predicting dense correspondences originating from the complex feature. The results of extensive experiments on a well-acknowledged benchmark for category-level 6D pose estimation demonstrate that our proposed method achieves a significant improvement in performance compared to previous methods.

Cited by
More filters
Journal ArticleDOI
14 Jul 2021-Sensors
TL;DR: In this paper, a robust real-time pear fruit counter for mobile applications using only RGB data, the variants of the state-of-the-art object detection model YOLOv4, and the multiple object-tracking algorithm Deep SORT was presented.
Abstract: This study aimed to produce a robust real-time pear fruit counter for mobile applications using only RGB data, the variants of the state-of-the-art object detection model YOLOv4, and the multiple object-tracking algorithm Deep SORT. This study also provided a systematic and pragmatic methodology for choosing the most suitable model for a desired application in agricultural sciences. In terms of accuracy, YOLOv4-CSP was observed as the optimal model, with an AP@0.50 of 98%. In terms of speed and computational cost, YOLOv4-tiny was found to be the ideal model, with a speed of more than 50 FPS and FLOPS of 6.8–14.5. If considering the balance in terms of accuracy, speed and computational cost, YOLOv4 was found to be most suitable and had the highest accuracy metrics while satisfying a real time speed of greater than or equal to 24 FPS. Between the two methods of counting with Deep SORT, the unique ID method was found to be more reliable, with an F1count of 87.85%. This was because YOLOv4 had a very low false negative in detecting pear fruits. The ROI line is more reliable because of its more restrictive nature, but due to flickering in detection it was not able to count some pears despite their being detected.

59 citations

Journal ArticleDOI
24 Jun 2022-Agronomy
TL;DR: In this article , a long-close distance coordination control strategy for a litchi picking robot was proposed based on an Intel Realsense D435i camera combined with a point cloud map collected by the camera.
Abstract: For the automated robotic picking of bunch-type fruit, the strategy is to roughly determine the location of the bunches, plan the picking route from a remote location, and then locate the picking point precisely at a more appropriate, closer location. The latter can reduce the amount of information to be processed and obtain more precise and detailed features, thus improving the accuracy of the vision system. In this study, a long-close distance coordination control strategy for a litchi picking robot was proposed based on an Intel Realsense D435i camera combined with a point cloud map collected by the camera. The YOLOv5 object detection network and DBSCAN point cloud clustering method were used to determine the location of bunch fruits at a long distance to then deduce the sequence of picking. After reaching the close-distance position, the Mask RCNN instance segmentation method was used to segment the more distinctive bifurcate stems in the field of view. By processing segmentation masks, a dual reference model of “Point + Line” was proposed, which guided picking by the robotic arm. Compared with existing studies, this strategy took into account the advantages and disadvantages of depth cameras. By experimenting with the complete process, the density-clustering approach in long distance was able to classify different bunches at a closer distance, while a success rate of 88.46% was achieved during fruit-bearing branch locating. This was an exploratory work that provided a theoretical and technical reference for future research on fruit-picking robots.

32 citations

Journal ArticleDOI
TL;DR: This method takes YOLOX-Tiny as the baseline and uses the lightweight network Shufflenetv2 added with the convolutional block attention module (CBAM) as the backbone to improve the detection accuracy, and only two extraction layers are used to simplify the network structure.
Abstract: In order to enable the picking robot to detect and locate apples quickly and accurately in the orchard natural environment, we propose an apple object detection method based on Shufflenetv2-YOLOX. This method takes YOLOX-Tiny as the baseline and uses the lightweight network Shufflenetv2 added with the convolutional block attention module (CBAM) as the backbone. An adaptive spatial feature fusion (ASFF) module is added to the PANet network to improve the detection accuracy, and only two extraction layers are used to simplify the network structure. The average precision (AP), precision, recall, and F1 of the trained network under the verification set are 96.76%, 95.62%, 93.75%, and 0.95, respectively, and the detection speed reaches 65 frames per second (FPS). The test results show that the AP value of Shufflenetv2-YOLOX is increased by 6.24% compared with YOLOX-Tiny, and the detection speed is increased by 18%. At the same time, it has a better detection effect and speed than the advanced lightweight networks YOLOv5-s, Efficientdet-d0, YOLOv4-Tiny, and Mobilenet-YOLOv4-Lite. Meanwhile, the half-precision floating-point (FP16) accuracy model on the embedded device Jetson Nano with TensorRT acceleration can reach 26.3 FPS. This method can provide an effective solution for the vision system of the apple picking robot.

20 citations

Journal ArticleDOI
TL;DR: In this article, a DA-YOLOv7 model was proposed to detect Camellia oleifera fruit in complex field scenes, which achieved the best detection performance and a strong generalisation ability in complex scenes.
Abstract: Rapid and accurate detection of Camellia oleifera fruit is beneficial to improve the picking efficiency. However, detection faces new challenges because of the complex field environment. A Camellia oleifera fruit detection method based on YOLOv7 network and multiple data augmentation was proposed to detect Camellia oleifera fruit in complex field scenes. Firstly, the images of Camellia oleifera fruit were collected in the field to establish training and test sets. Detection performance was then compared among YOLOv7, YOLOv5s, YOLOv3-spp and Faster R-CNN networks. The YOLOv7 network with the best performance was selected. A DA-YOLOv7 model was established via the YOLOv7 network combined with various data augmentation methods. The DA-YOLOv7 model had the best detection performance and a strong generalisation ability in complex scenes, with mAP, Precision, Recall, F1 score and average detection time of 96.03%, 94.76%, 95.54%, 95.15% and 0.025 s per image, respectively. Therefore, YOLOv7 combined with data augmentation can be used to detect Camellia oleifera fruit in complex scenes. This study provides a theoretical reference for the detection and harvesting of crops under complex conditions.

20 citations