Towards Automatic Wild Animal Monitoring: Identification of Animal Species in Camera-trap Images using Very Deep Convolutional Neural Networks

doi:10.1016/J.ECOINF.2017.07.004

Home
/
Papers
/
Towards Automatic Wild Animal Monitoring: Identification of Animal Species in Camera-trap Images using Very Deep Convolutional Neural Networks

Journal Article•DOI•

Towards Automatic Wild Animal Monitoring: Identification of Animal Species in Camera-trap Images using Very Deep Convolutional Neural Networks

Alexander Gomez Villa¹, Augusto Salazar¹, Francisco Vargas¹•Institutions (1)

University of Antioquia¹

01 Sep 2017-Ecological Informatics (Elsevier)-Vol. 41, pp 24-32

TL;DR: In this paper, a very deep convolutional neural network (VDCNN) was used for species classification on the Snapshot Serengeti (SSe) dataset.

read less

About: This article is published in Ecological Informatics.The article was published on 2017-09-01 and is currently open access. It has received 177 citations till now. The article focuses on the topics: Convolutional neural network.

...read moreread less

Citations

PDF

Open Access

More filters

Posted Content•

Automatically identifying, counting, and describing wild animals in camera-trap images with deep learning

[...]

Mohammad Sadegh Norouzzadeh¹, Anh Nguyen², Margaret Kosmala³, Alexandra Swanson⁴, Meredith S. Palmer⁵, Craig Packer⁵, Jeff Clune¹ - Show less +3 more•Institutions (5)

University of Wyoming¹, Auburn University², Harvard University³, University of Oxford⁴, University of Minnesota⁵

16 Mar 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: The ability to automatically, accurately, and inexpensively collect such data, which could help catalyze the transformation of many fields of ecology, wildlife biology, zoology, conservation biology, and animal behavior into “big data” sciences is investigated.

...read moreread less

Abstract: Having accurate, detailed, and up-to-date information about the location and behavior of animals in the wild would revolutionize our ability to study and conserve ecosystems. We investigate the ability to automatically, accurately, and inexpensively collect such data, which could transform many fields of biology, ecology, and zoology into "big data" sciences. Motion sensor "camera traps" enable collecting wildlife pictures inexpensively, unobtrusively, and frequently. However, extracting information from these pictures remains an expensive, time-consuming, manual task. We demonstrate that such information can be automatically extracted by deep learning, a cutting-edge type of artificial intelligence. We train deep convolutional neural networks to identify, count, and describe the behaviors of 48 species in the 3.2-million-image Snapshot Serengeti dataset. Our deep neural networks automatically identify animals with over 93.8% accuracy, and we expect that number to improve rapidly in years to come. More importantly, if our system classifies only images it is confident about, our system can automate animal identification for 99.3% of the data while still performing at the same 96.6% accuracy as that of crowdsourced teams of human volunteers, saving more than 8.4 years (at 40 hours per week) of human labeling effort (i.e. over 17,000 hours) on this 3.2-million-image dataset. Those efficiency gains immediately highlight the importance of using deep neural networks to automate data extraction from camera-trap images. Our results suggest that this technology could enable the inexpensive, unobtrusive, high-volume, and even real-time collection of a wealth of information about vast numbers of animals in the wild.

...read moreread less

491 citations

Journal Article•DOI•

Convolutional neural network: a review of models, methodologies and applications to object detection

[...]

Anamika Dhillon¹, Gyanendra K. Verma¹•Institutions (1)

National Institute of Technology, Kurukshetra¹

01 Jun 2020-Progress in Artificial Intelligence

TL;DR: This paper mainly focus on the application of deep learning architectures to three major applications, namely (i) wild animal detection, (ii) small arm detection and (iii) human being detection.

...read moreread less

Abstract: Deep learning has developed as an effective machine learning method that takes in numerous layers of features or representation of the data and provides state-of-the-art results. The application of deep learning has shown impressive performance in various application areas, particularly in image classification, segmentation and object detection. Recent advances of deep learning techniques bring encouraging performance to fine-grained image classification which aims to distinguish subordinate-level categories. This task is extremely challenging due to high intra-class and low inter-class variance. In this paper, we provide a detailed review of various deep architectures and model highlighting characteristics of particular model. Firstly, we described the functioning of CNN architectures and its components followed by detailed description of various CNN models starting with classical LeNet model to AlexNet, ZFNet, GoogleNet, VGGNet, ResNet, ResNeXt, SENet, DenseNet, Xception, PNAS/ENAS. We mainly focus on the application of deep learning architectures to three major applications, namely (i) wild animal detection, (ii) small arm detection and (iii) human being detection. A detailed review summary including the systems, database, application and accuracy claimed is also provided for each model to serve as guidelines for future work in the above application areas.

...read moreread less

435 citations

Posted Content•

Recognition in Terra Incognita

[...]

Sara Beery¹, Grant Van Horn¹, Pietro Perona¹•Institutions (1)

California Institute of Technology¹

13 Jul 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: It is desirable for detection and classification algorithms to generalize to unfamiliar environments, but suitable benchmarks for quantitatively studying this phenomenon are not yet available, so a dataset designed to measure recognition generalization to novel environments is presented.

...read moreread less

Abstract: It is desirable for detection and classification algorithms to generalize to unfamiliar environments, but suitable benchmarks for quantitatively studying this phenomenon are not yet available. We present a dataset designed to measure recognition generalization to novel environments. The images in our dataset are harvested from twenty camera traps deployed to monitor animal populations. Camera traps are fixed at one location, hence the background changes little across images; capture is triggered automatically, hence there is no human bias. The challenge is learning recognition in a handful of locations, and generalizing animal detection and classification to new locations where no training data is available. In our experiments state-of-the-art algorithms show excellent performance when tested at the same location where they were trained. However, we find that generalization to new locations is poor, especially for classification systems.

...read moreread less

298 citations

Journal Article•DOI•

Applications for Deep Learning in Ecology

[...]

Sylvain Christin¹, Eric Hervet¹, Nicolas Lecomte¹•Institutions (1)

Université de Moncton¹

01 Oct 2019-Methods in Ecology and Evolution

TL;DR: It is argued that at a time when automatic monitoring of populations and ecosystems generates a vast amount of data that cannot be effectively processed by humans anymore, deep learning could become a powerful reference tool for ecologists.

...read moreread less

Abstract: A lot of hype has recently been generated around deep learning, a group of artificial intelligence approaches able to break accuracy records in pattern recognition. Over the course of just a few years, deep learning revolutionized several research fields such as bioinformatics or medicine. Yet such a surge of tools and knowledge is still in its infancy in ecology despite the ever-growing size and the complexity of ecological datasets. Here we performed a literature review of deep learning implementations in ecology to identify its benefits in most ecological disciplines, even in applied ecology, up to decision makers and conservationists alike. We also provide guidelines on useful resources and recommendations for ecologists to start adding deep learning to their toolkit. At a time when automatic monitoring of populations and ecosystems generates a vast amount of data that cannot be processed by humans anymore, deep learning could become a necessity in ecology.

...read moreread less

276 citations

Book Chapter•DOI•

Recognition in Terra Incognita

[...]

Sara Beery¹, Grant Van Horn¹, Pietro Perona¹•Institutions (1)

California Institute of Technology¹

08 Sep 2018

TL;DR: The CaltechCameraTraps dataset as mentioned in this paper is designed to measure recognition generalization to novel environments, where cameras are fixed at one location, hence the background changes little across images; capture is triggered automatically, hence there is no human bias.

...read moreread less

Abstract: It is desirable for detection and classification algorithms to generalize to unfamiliar environments, but suitable benchmarks for quantitatively studying this phenomenon are not yet available We present a dataset designed to measure recognition generalization to novel environments The images in our dataset are harvested from twenty camera traps deployed to monitor animal populations Camera traps are fixed at one location, hence the background changes little across images; capture is triggered automatically, hence there is no human bias The challenge is learning recognition in a handful of locations, and generalizing animal detection and classification to new locations where no training data is available In our experiments state-of-the-art algorithms show excellent performance when tested at the same location where they were trained However, we find that generalization to new locations is poor, especially for classification systems(The dataset is available at https://beerysgithubio/CaltechCameraTraps/)

...read moreread less

259 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36

Collapse

References

PDF

Open Access

More filters

Proceedings Article•

ImageNet Classification with Deep Convolutional Neural Networks

[...]

Alex Krizhevsky¹, Ilya Sutskever¹, Geoffrey E. Hinton¹•Institutions (1)

University of Toronto¹

03 Dec 2012

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

Abstract: We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overriding in the fully-connected layers we employed a recently-developed regularization method called "dropout" that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.

...read moreread less

73,978 citations

Proceedings Article•

Very Deep Convolutional Networks for Large-Scale Image Recognition

[...]

Karen Simonyan¹, Andrew Zisserman¹•Institutions (1)

University of Oxford¹

04 Sep 2014

TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.

...read moreread less

Abstract: In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.

...read moreread less

55,235 citations

Posted Content•

Deep Residual Learning for Image Recognition

[...]

Kaiming He¹, Xiangyu Zhang¹, Shaoqing Ren¹, Jian Sun¹•Institutions (1)

Microsoft¹

10 Dec 2015-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.

...read moreread less

Abstract: Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers---8x deeper than VGG nets but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

...read moreread less

44,703 citations

Journal Article•DOI•

Gradient-based learning applied to document recognition

[...]

Yann LeCun¹, Léon Bottou², Léon Bottou³, Yoshua Bengio⁴, Yoshua Bengio³, Yoshua Bengio⁵, Patrick Haffner³ - Show less +3 more•Institutions (5)

Bell Labs¹, École Normale Supérieure², AT&T³, École Polytechnique de Montréal⁴, Alcatel-Lucent⁵

01 Jan 1998

TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.

...read moreread less

Abstract: Multilayer neural networks trained with the back-propagation algorithm constitute the best example of a successful gradient based learning technique. Given an appropriate network architecture, gradient-based learning algorithms can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters, with minimal preprocessing. This paper reviews various methods applied to handwritten character recognition and compares them on a standard handwritten digit recognition task. Convolutional neural networks, which are specifically designed to deal with the variability of 2D shapes, are shown to outperform all other techniques. Real-life document recognition systems are composed of multiple modules including field extraction, segmentation recognition, and language modeling. A new learning paradigm, called graph transformer networks (GTN), allows such multimodule systems to be trained globally using gradient-based methods so as to minimize an overall performance measure. Two systems for online handwriting recognition are described. Experiments demonstrate the advantage of global training, and the flexibility of graph transformer networks. A graph transformer network for reading a bank cheque is also described. It uses convolutional neural network character recognizers combined with global training techniques to provide record accuracy on business and personal cheques. It is deployed commercially and reads several million cheques per day.

...read moreread less

42,067 citations

Proceedings Article•DOI•

Going deeper with convolutions

[...]

Christian Szegedy¹, Wei Liu², Yangqing Jia¹, Pierre Sermanet¹, Scott Reed³, Dragomir Anguelov¹, Dumitru Erhan¹, Vincent Vanhoucke¹, Andrew Rabinovich - Show less +5 more•Institutions (3)

Google¹, University of North Carolina at Chapel Hill², University of Michigan³

07 Jun 2015

TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).

...read moreread less

Abstract: We propose a deep convolutional neural network architecture codenamed Inception that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14). The main hallmark of this architecture is the improved utilization of the computing resources inside the network. By a carefully crafted design, we increased the depth and width of the network while keeping the computational budget constant. To optimize quality, the architectural decisions were based on the Hebbian principle and the intuition of multi-scale processing. One particular incarnation used in our submission for ILSVRC14 is called GoogLeNet, a 22 layers deep network, the quality of which is assessed in the context of classification and detection.

...read moreread less

40,257 citations