scispace - formally typeset
Search or ask a question
Journal ArticleDOI

An improved Yolov3 based on dual path network for cherry tomatoes detection

12 Jul 2021-Journal of Food Process Engineering (John Wiley & Sons, Ltd)-Vol. 44, Iss: 10
About: This article is published in Journal of Food Process Engineering.The article was published on 2021-07-12. It has received 20 citations till now.
Citations
More filters
Journal ArticleDOI
31 Jan 2022-Agronomy
TL;DR: This paper proposes the use of DL models (SSD MobileNet v2 and YOLOv4) to efficiently detect the tomatoes and compares those systems with a proposed histogram-based HSV colour space model to classify each tomato and determine its ripening stage, through two image datasets acquired.
Abstract: The harvesting operation is a recurring task in the production of any crop, thus making it an excellent candidate for automation. In protected horticulture, one of the crops with high added value is tomatoes. However, its robotic harvesting is still far from maturity. That said, the development of an accurate fruit detection system is a crucial step towards achieving fully automated robotic harvesting. Deep Learning (DL) and detection frameworks like Single Shot MultiBox Detector (SSD) or You Only Look Once (YOLO) are more robust and accurate alternatives with better response to highly complex scenarios. The use of DL can be easily used to detect tomatoes, but when their classification is intended, the task becomes harsh, demanding a huge amount of data. Therefore, this paper proposes the use of DL models (SSD MobileNet v2 and YOLOv4) to efficiently detect the tomatoes and compare those systems with a proposed histogram-based HSV colour space model to classify each tomato and determine its ripening stage, through two image datasets acquired. Regarding detection, both models obtained promising results, with the YOLOv4 model standing out with an F1-Score of 85.81%. For classification task the YOLOv4 was again the best model with an Macro F1-Score of 74.16%. The HSV colour space model outperformed the SSD MobileNet v2 model, obtaining results similar to the YOLOv4 model, with a Balanced Accuracy of 68.10%.

27 citations

Journal ArticleDOI
TL;DR: In this article , an improved YOLOv5 deep learning algorithm was used to propose a visual identification method for the growth forms of apples in the orchard, which achieved high accuracy and real-time performance, with the map reaching 98.4% and the F1 value of 0.928.

13 citations

Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed a method for detecting tomato young fruits with near color background based on improved Faster R-CNN with an attention mechanism, and the results show that the mean Average Precision (mAP) of the proposed method reaches 98.46%, and the average detection time per image is only 0.084 s.
Abstract: The information of tomato young fruits acquisition has an important impact on monitoring fruit growth, early control of pests and diseases and yield estimation. It is of great significance for timely removing young fruits with abnormal growth status, improving the fruits quality, and maintaining high and stable yields. Tomato young fruits are similar in color to the stems and leaves, and there are interference factors, such as fruits overlap, stems and leaves occlusion, and light influence. In order to improve the detection accuracy and efficiency of tomato young fruits, this paper proposes a method for detecting tomato young fruits with near color background based on improved Faster R-CNN with an attention mechanism. First, ResNet50 is used as the feature extraction backbone, and the feature map extracted is optimized through Convolutional Block Attention Module (CBAM). Then, Feature Pyramid Network (FPN) is used to integrate high-level semantic features into low-level detailed features to enhance the model sensitivity of scale. Finally, Soft Non-Maximum Suppression (Soft-NMS) is used to reduce the missed detection rate of overlapping fruits. The results show that the mean Average Precision (mAP) of the proposed method reaches 98.46%, and the average detection time per image is only 0.084 s, which can achieve the real-time and accurate detection of tomato young fruits. The research shows that the method in this paper can efficiently identify tomato young fruits, and provides a better solution for the detection of fruits with near color background.

10 citations

Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed an improved Mask R-CNN for visual recognition of cherry tomatoes, which achieved an accuracy of 93.76% for fruit recognition, which is 11.53% and 15.34% for stem recognition, respectively.

9 citations

Journal ArticleDOI
08 Jul 2022-Agronomy
TL;DR: The proposed SE-YOLOv3-MobileNetV1 model is able to distinguish tomatoes in four kinds of maturity, the mean average precision value of tomato reaches 97.5% and provides a reference for tomato maturity classification of tomato harvesting robot.
Abstract: The maturity level of tomato is a key factor of tomato picking, which directly determines the transportation distance, storage time, and market freshness of postharvest tomato. In view of the lack of studies on tomato maturity classification under nature greenhouse environment, this paper proposes a SE-YOLOv3-MobileNetV1 network to classify four kinds of tomato maturity. The proposed maturity classification model is improved in terms of speed and accuracy: (1) Speed: Depthwise separable convolution is used. (2) Accuracy: Mosaic data augmentation, K-means clustering algorithm, and the Squeeze-and-Excitation attention mechanism module are used. To verify the detection performance, the proposed model is compared with the current mainstream models, such as YOLOv3, YOLOv3-MobileNetV1, and YOLOv5 in terms of accuracy and speed. The SE-YOLOv3-MobileNetV1 model is able to distinguish tomatoes in four kinds of maturity, the mean average precision value of tomato reaches 97.5%. The detection speed of the proposed model is 278.6 and 236.8 ms faster than the YOLOv3 and YOLOv5 model. In addition, the proposed model is considerably lighter than YOLOv3 and YOLOv5, which meets the need of embedded development, and provides a reference for tomato maturity classification of tomato harvesting robot.

8 citations

References
More filters
Proceedings ArticleDOI
27 Jun 2016
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Abstract: Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers—8× deeper than VGG nets [40] but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions1, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

123,388 citations

Proceedings ArticleDOI
21 Jul 2017
TL;DR: DenseNet as mentioned in this paper proposes to connect each layer to every other layer in a feed-forward fashion, which can alleviate the vanishing gradient problem, strengthen feature propagation, encourage feature reuse, and substantially reduce the number of parameters.
Abstract: Recent work has shown that convolutional networks can be substantially deeper, more accurate, and efficient to train if they contain shorter connections between layers close to the input and those close to the output. In this paper, we embrace this observation and introduce the Dense Convolutional Network (DenseNet), which connects each layer to every other layer in a feed-forward fashion. Whereas traditional convolutional networks with L layers have L connections—one between each layer and its subsequent layer—our network has L(L+1)/2 direct connections. For each layer, the feature-maps of all preceding layers are used as inputs, and its own feature-maps are used as inputs into all subsequent layers. DenseNets have several compelling advantages: they alleviate the vanishing-gradient problem, strengthen feature propagation, encourage feature reuse, and substantially reduce the number of parameters. We evaluate our proposed architecture on four highly competitive object recognition benchmark tasks (CIFAR-10, CIFAR-100, SVHN, and ImageNet). DenseNets obtain significant improvements over the state-of-the-art on most of them, whilst requiring less memory and computation to achieve high performance. Code and pre-trained models are available at https://github.com/liuzhuang13/DenseNet.

27,821 citations

Proceedings ArticleDOI
27 Jun 2016
TL;DR: Compared to state-of-the-art detection systems, YOLO makes more localization errors but is less likely to predict false positives on background, and outperforms other detection methods, including DPM and R-CNN, when generalizing from natural images to other domains like artwork.
Abstract: We present YOLO, a new approach to object detection. Prior work on object detection repurposes classifiers to perform detection. Instead, we frame object detection as a regression problem to spatially separated bounding boxes and associated class probabilities. A single neural network predicts bounding boxes and class probabilities directly from full images in one evaluation. Since the whole detection pipeline is a single network, it can be optimized end-to-end directly on detection performance. Our unified architecture is extremely fast. Our base YOLO model processes images in real-time at 45 frames per second. A smaller version of the network, Fast YOLO, processes an astounding 155 frames per second while still achieving double the mAP of other real-time detectors. Compared to state-of-the-art detection systems, YOLO makes more localization errors but is less likely to predict false positives on background. Finally, YOLO learns very general representations of objects. It outperforms other detection methods, including DPM and R-CNN, when generalizing from natural images to other domains like artwork.

27,256 citations

Journal ArticleDOI
TL;DR: This work introduces a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals and further merge RPN and Fast R-CNN into a single network by sharing their convolutionAL features.
Abstract: State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet [1] and Fast R-CNN [2] have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. An RPN is a fully convolutional network that simultaneously predicts object bounds and objectness scores at each position. The RPN is trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection. We further merge RPN and Fast R-CNN into a single network by sharing their convolutional features—using the recently popular terminology of neural networks with ’attention’ mechanisms, the RPN component tells the unified network where to look. For the very deep VGG-16 model [3] , our detection system has a frame rate of 5 fps ( including all steps ) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007, 2012, and MS COCO datasets with only 300 proposals per image. In ILSVRC and COCO 2015 competitions, Faster R-CNN and RPN are the foundations of the 1st-place winning entries in several tracks. Code has been made publicly available.

26,458 citations

Posted Content
TL;DR: Faster R-CNN as discussed by the authors proposes a Region Proposal Network (RPN) to generate high-quality region proposals, which are used by Fast R-NN for detection.
Abstract: State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet and Fast R-CNN have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. An RPN is a fully convolutional network that simultaneously predicts object bounds and objectness scores at each position. The RPN is trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection. We further merge RPN and Fast R-CNN into a single network by sharing their convolutional features---using the recently popular terminology of neural networks with 'attention' mechanisms, the RPN component tells the unified network where to look. For the very deep VGG-16 model, our detection system has a frame rate of 5fps (including all steps) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007, 2012, and MS COCO datasets with only 300 proposals per image. In ILSVRC and COCO 2015 competitions, Faster R-CNN and RPN are the foundations of the 1st-place winning entries in several tracks. Code has been made publicly available.

23,183 citations