Open accessJournal Article

# Apple Detection in Complex Scene Using the Improved YOLOv4 Model

04 Mar 2021-Agronomy (Multidisciplinary Digital Publishing Institute)-Vol. 11, Iss: 3, pp 476
Abstract: To enable the apple picking robot to quickly and accurately detect apples under the complex background in orchards, we propose an improved You Only Look Once version 4 (YOLOv4) model and data augmentation methods. Firstly, the crawler technology is utilized to collect pertinent apple images from the Internet for labeling. For the problem of insufficient image data caused by the random occlusion between leaves, in addition to traditional data augmentation techniques, a leaf illustration data augmentation method is proposed in this paper to accomplish data augmentation. Secondly, due to the large size and calculation of the YOLOv4 model, the backbone network Cross Stage Partial Darknet53 (CSPDarknet53) of the YOLOv4 model is replaced by EfficientNet, and convolution layer (Conv2D) is added to the three outputs to further adjust and extract the features, which make the model lighter and reduce the computational complexity. Finally, the apple detection experiment is performed on 2670 expanded samples. The test results show that the EfficientNet-B0-YOLOv4 model proposed in this paper has better detection performance than YOLOv3, YOLOv4, and Faster R-CNN with ResNet, which are state-of-the-art apple detection model. The average values of Recall, Precision, and F1 reach 97.43%, 95.52%, and 96.54% respectively, the average detection time per frame of the model is 0.338 s, which proves that the proposed method can be well applied in the vision system of picking robots in the apple industry.

##### Citations
More

5 results found

Open accessJournal Article
14 Jul 2021-Sensors
Abstract: This study aimed to produce a robust real-time pear fruit counter for mobile applications using only RGB data, the variants of the state-of-the-art object detection model YOLOv4, and the multiple object-tracking algorithm Deep SORT. This study also provided a systematic and pragmatic methodology for choosing the most suitable model for a desired application in agricultural sciences. In terms of accuracy, YOLOv4-CSP was observed as the optimal model, with an AP@0.50 of 98%. In terms of speed and computational cost, YOLOv4-tiny was found to be the ideal model, with a speed of more than 50 FPS and FLOPS of 6.8–14.5. If considering the balance in terms of accuracy, speed and computational cost, YOLOv4 was found to be most suitable and had the highest accuracy metrics while satisfying a real time speed of greater than or equal to 24 FPS. Between the two methods of counting with Deep SORT, the unique ID method was found to be more reliable, with an F1count of 87.85%. This was because YOLOv4 had a very low false negative in detecting pear fruits. The ROI line is more reliable because of its more restrictive nature, but due to flickering in detection it was not able to count some pears despite their being detected.

Topics: sort (51%)

3 Citations

Open accessJournal Article
Abstract: In recent years, vehicle detection and classification have become essential tasks of intelligent transportation systems, and real-time, accurate vehicle detection from image and video data for traffic monitoring remains challenging. The most noteworthy challenges are real-time system operation to accurately locate and classify vehicles in traffic flows and working around total occlusions that hinder vehicle tracking. For real-time traffic monitoring, we present a traffic monitoring approach that overcomes the abovementioned challenges by employing convolutional neural networks that utilize You Only Look Once (YOLO). A real-time traffic monitoring system has been developed, and it has attracted significant attention from traffic management departments. Digitally processing and analyzing these videos in real time is crucial for extracting reliable data on traffic flow. Therefore, this study presents a real-time traffic monitoring system based on a virtual detection zone, Gaussian mixture model (GMM), and YOLO to increase the vehicle counting and classification efficiency. GMM and a virtual detection zone are used for vehicle counting, and YOLO is used to classify vehicles. Moreover, the distance and time traveled by a vehicle are used to estimate the speed of the vehicle. In this study, the Montevideo Audio and Video Dataset (MAVD), the GARM Road-Traffic Monitoring data set (GRAM-RTM), and our collection data sets are used to verify the proposed method. Experimental results indicate that the proposed method with YOLOv4 achieved the highest classification accuracy of 98.91% and 99.5% in MAVD and GRAM-RTM data sets, respectively. Moreover, the proposed method with YOLOv4 also achieves the highest classification accuracy of 99.1%, 98.6%, and 98% in daytime, night time, and rainy day, respectively. In addition, the average absolute percentage error of vehicle speed estimation with the proposed method is about 7.6%.

Topics: , Traffic flow (53%),

Book Chapter
Zhenxin Yuan1, Shuai Zhao, Fu Zhao, Tao Ma  +1 moreInstitutions (1)
22 Oct 2021-
Abstract: At present, the production efficiency of automobile rim in the industrial field is affected by the detection process of automobile rim quality after steel forging. The traditional way is to check welding position manually, which can facilitate the air tightness detection after weld is pressurized. However, this can largely affect production efficiency. By introducing computer vision and image processing, the position of the rim weld can be accurately located, which is more accurate and time-saving. In order to ensure high accuracy and speed of detection, we propose an automobile rim weld detection algorithm YOLOv4-head2-BiFPN on the basis of YOLOv4 algorithm. The experimental results show that, for one thing, it does not affect the detection speed by strengthening feature fusion and removing redundant detection heads. For another, the AP75 of the improved YOLOv4-head2-BiFPN algorithm in the automobile rim weld detection task is 7.7% higher than that of the original YOLOv4 algorithm.

Topics: Object detection (55%)

Proceedings Article
Fangfang Chen1, Xiaowei Xu1, Ye Tao2, Xiaodong Wang1  +2 moreInstitutions (2)
23 Jul 2021-
Abstract: Image recognition of refrigerator items has become one of the hotspots in computer vision with the widespread usage of an intelligent refrigerator. There are some problems, such as slow detection speed, missed detection, false detection, when applying a convolutional neural network for the recognition of refrigerator image. To further improve the recognition effect, we proposed an improved YOLOv3-45k method. Firstly, the number of residual modules in feature extraction network is reduced, so that to improve the detection speed. Secondly, the $\mathbf{K}-\mathbf{means}++$ clustering algorithm is used to obtain the anchors of refrigerator items, which is adjusted reasonably by manual method, and then corresponding to their respective feature layers to improve the detection accuracy. The experimental results show that the proposed algorithm can detect 57 images per second on the refrigerator item data set, and the mAP is 84.40%. Compared with the original YOLOv3 algorithm, the proposed YOLOv3-45k method has better detection accuracy and detection speed in the refrigerator item data set.

Topics: Feature extraction (55%), , Cluster analysis (51%)
##### References
More

35 results found

Open accessProceedings Article
Kaiming He1, Xiangyu Zhang1, Shaoqing Ren1, Jian Sun1Institutions (1)
27 Jun 2016-
Abstract: Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers—8× deeper than VGG nets [40] but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions1, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

Topics: Deep learning (53%), Residual (53%),  ... read more

93,356 Citations

Open accessProceedings Article
03 Dec 2012-
Abstract: We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overriding in the fully-connected layers we employed a recently-developed regularization method called "dropout" that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.

Topics: , Deep learning (59%),  ... read more

73,871 Citations

Open accessProceedings Article
Karen Simonyan1, Andrew Zisserman1Institutions (1)
01 Jan 2015-
Abstract: In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.

49,857 Citations

Open accessProceedings Article
Karen Simonyan1, Andrew Zisserman1Institutions (1)
04 Sep 2014-
Abstract: In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.

Topics: , Image translation (50%)

38,283 Citations

Journal ArticleDOI: 10.1109/5.726791
Yann LeCun1, Léon Bottou2, Léon Bottou3, Yoshua Bengio3  +3 moreInstitutions (5)
01 Jan 1998-
Abstract: Multilayer neural networks trained with the back-propagation algorithm constitute the best example of a successful gradient based learning technique. Given an appropriate network architecture, gradient-based learning algorithms can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters, with minimal preprocessing. This paper reviews various methods applied to handwritten character recognition and compares them on a standard handwritten digit recognition task. Convolutional neural networks, which are specifically designed to deal with the variability of 2D shapes, are shown to outperform all other techniques. Real-life document recognition systems are composed of multiple modules including field extraction, segmentation recognition, and language modeling. A new learning paradigm, called graph transformer networks (GTN), allows such multimodule systems to be trained globally using gradient-based methods so as to minimize an overall performance measure. Two systems for online handwriting recognition are described. Experiments demonstrate the advantage of global training, and the flexibility of graph transformer networks. A graph transformer network for reading a bank cheque is also described. It uses convolutional neural network character recognizers combined with global training techniques to provide record accuracy on business and personal cheques. It is deployed commercially and reads several million cheques per day.