scispace - formally typeset
Search or ask a question

Showing papers in "Methods in Ecology and Evolution in 2021"


Journal ArticleDOI
TL;DR: RaxmlGUI as discussed by the authors is a graphical user interface to RAxML, one of the most popular and widely used software for phylogenetic inference using maximum likelihood, which is a complete rewrite of the GUI which seamlessly integrates RaxML binaries for all major operating systems with an intuitive graphical front-end.
Abstract: RaxmlGUI is a graphical user interface to RAxML, one of the most popular and widely used software for phylogenetic inference using maximum likelihood. Here we present raxmlGUI 2.0, a complete rewrite of the GUI which seamlessly integrates RAxML binaries for all major operating systems with an intuitive graphical front-end to set up and run phylogenetic analyses. Our program offers automated pipelines for analyses that require multiple successive calls of RAxML and built-in functions to concatenate alignment files while automatically specifying the appropriate partition settings. In addition to RAxML 8.x, raxmlGUI 2.0 also supports the new RAxML Next Generation. RaxmlGUI facilitates phylogenetic analyses by coupling an intuitive interface with the unmatched performance of RAxML.

264 citations


Journal ArticleDOI
TL;DR: An insufficient reporting of model performance and parameterization, heavy reliance on model selection with AICc and low utilization of spatial cross‐validation are found; it is explained how ENMeval 2.0 can help address these issues.

158 citations



Journal ArticleDOI
TL;DR: Using the relationship between DI and cross‐validation performance showed potential to limit predictions to the area where a user‐defined performance applies, and a methodology is suggested that delineates the ‘area of applicability’ (AOA) that is based on the minimum distance to the training data in the multidimensional predictor space.
Abstract: Predictive modelling using machine learning has become very popular for spatial mapping of the environment. Models are often applied to make predictions far beyond sampling locations where new geographic locations might considerably differ from the training data in their environmental properties. However, areas in the predictor space without support of training data are problematic. Since the model has no knowledge about these environments, predictions have to be considered uncertain. Estimating the area to which a prediction model can be reliably applied is required. Here, we suggest a methodology that delineates the "area of applicability" (AOA) that we define as the area, for which the cross-validation error of the model applies. We first propose a "dissimilarity index" (DI) that is based on the minimum distance to the training data in the predictor space, with predictors being weighted by their respective importance in the model. The AOA is then derived by applying a threshold based on the DI of the training data where the DI is calculated with respect to the cross-validation strategy used for model training. We test for the ideal threshold by using simulated data and compare the prediction error within the AOA with the cross-validation error of the model. We illustrate the approach using a simulated case study. Our simulation study suggests a threshold on DI to define the AOA at the .95 quantile of the DI in the training data. Using this threshold, the prediction error within the AOA is comparable to the cross-validation RMSE of the model, while the cross-validation error does not apply outside the AOA. This applies to models being trained with randomly distributed training data, as well as when training data are clustered in space and where spatial cross-validation is applied. We suggest to report the AOA alongside predictions, complementary to validation measures.

98 citations


Journal ArticleDOI
TL;DR: This paper focuses on the most widely used single‐board computer, the Raspberry Pi, and review its broad applications and uses across the biological domain, and provides detailed guidelines, recommendations and considerations to help accelerate the uptake of the Pi by the scientific community.
Abstract: Handling Editor: Chloe Robinson Abstract 1. The field of biology has seen tremendous technological progress in recent years, fuelled by the exponential growth in processing power and highlevel computing, and the rise of global information sharing. Lowcost singleboard computers are predicted to be one of the key technological advancements to further revolutionise this field. 2. So far, an overview of current uptake of these devices and a general guide to help researchers integrate them in their work has been missing. In this paper I focus on the most widely used singleboard computer, the Raspberry Pi, and review its broad applications and uses across the biological domain. 3. Since its release in 2012, the Raspberry Pi has been increasingly taken up by biologists, in the laboratory, the field and in the classroom, and across a wide range of disciplines. A hugely diverse range of applications exists that ranges from simple solutions to dedicated custombuild devices, including nestbox monitoring, wildlife camera trapping, highthroughput behavioural recording, largescale plant phenotyping, underwater video surveillance, closedloop operant learning experiments and autonomous ecosystem monitoring. Despite the breadth of its implementations, the depth of uptake of the Raspberry Pi by the scientific community is still limited. 4. The broad capabilities of the Raspberry Pi, combined with its low cost, ease of use and large user community make it a great research tool for almost any project. To help accelerate the uptake of the Raspberry Pi by the scientific community, I provide detailed guidelines, recommendations and considerations, and 30+ stepbystep guides on a dedicated accompanying website (http://raspb erryp iguide.github.io). I hope this paper will help generate more awareness about the Raspberry Pi among scientists and thereby both fuel the democratisation of science and ultimately help advance our understanding of biology, from the microto the macroscale.

78 citations


Journal ArticleDOI
TL;DR: This paper combines the power of machine intelligence and human intelligence via a novel active learning system to minimize the manual work required to train a computer vision model, and is the first work to apply an active learning approach to camera trap images.
Abstract: 1. A typical camera trap survey may produce millions of images that require slow, expensive manual review. Consequently, critical conservation questions may be answered too slowly to support decision‐making. Recent studies demonstrated the potential for computer vision to dramatically increase efficiency in image‐based biodiversity surveys; however, the literature has focused on projects with a large set of labeled training images, and hence many projects with a smaller set of labeled images cannot benefit from existing machine learning techniques. Furthermore, even sizable projects have struggled to adopt computer vision methods because classification models overfit to specific image backgrounds (i.e., camera locations). 2. In this paper, we combine the power of machine intelligence and human intelligence via a novel active learning system to minimize the manual work required to train a computer vision model. Furthermore, we utilize object detection models and transfer learning to prevent overfitting to camera locations. To our knowledge, this is the first work to apply an active learning approach to camera trap images. 3. Our proposed scheme can match state‐of‐the‐art accuracy on a 3.2 million image dataset with as few as 14,100 manual labels, which means decreasing manual labeling effort by over 99.5%. Our trained models are also less dependent on background pixels, since they operate only on cropped regions around animals. 4. The proposed active deep learning scheme can significantly reduce the manual labor required to extract information from camera trap images. Automation of information extraction will not only benefit existing camera trap projects, but can also catalyze the deployment of larger camera trap arrays.

73 citations




Journal ArticleDOI
TL;DR: The work of as discussed by the authors was supported by the Plan Nacional de I+D+i (project PGC2018-099027-B-I00) and the Czech Science Foundation grant 20-17282S.Z.L. and P.F.B.
Abstract: J.L. and P.F. were supported by the Czech Science Foundation grant 20-17282S. Z.B.-D. was supported by NKFIH (project K 124671). F.d.B. was supported by the Plan Nacional de I+D+i (project PGC2018-099027-B-I00).

70 citations


Journal ArticleDOI
TL;DR: An overview of the most commonly used social network measures in animal research for static networks or time‐aggregated networks is provided and a guideline indicating how to use them depending on the data collection protocol, the social system studied and the research question addressed is provided.
Abstract: 19 1. We provide an overview of the most commonly used social network measures in animal research 20 for static networks or time-aggregated networks. 21 2. For each of these measures, we provide clear explanations as to what they measure, we describe 22 their respective variants, we underline the necessity to consider these variants according to the 23 research question addressed, and we indicate considerations that have not been taken so far. 24 3. We provide a guideline indicating how to use them depending on of the data collection protocol, 25 the social system studied and the research question addressed. Finally, we inform about the existent 26 gaps and remaining challenges in the use of several variants and provide future research directions. 27 A cc ep te d A rt ic le

66 citations



Journal ArticleDOI
TL;DR: It is shown that datastream permutations typically do not represent the null hypothesis of interest to researchers interfacing animal social network analysis with regression modelling, and simulations are used to demonstrate the potential pitfalls of using this methodology.
Abstract: O_LISocial network methods have become a key tool for describing, modelling, and testing hypotheses about the social structures of animals. However, due to the non-independence of network data and the presence of confounds, specialized statistical techniques are often needed to test hypotheses in these networks. Datastream permutations, originally developed to test the null hypothesis of random social structure, have become a popular tool for testing a wide array of null hypotheses. In particular, they have been used to test whether exogenous factors are related to network structure by interfacing these permutations with regression models. C_LIO_LIHere, we show that these datastream permutations typically do not represent the null hypothesis of interest to researchers interfacing animal social network analysis with regression modelling, and use simulations to demonstrate the potential pitfalls of using this methodology. C_LIO_LIOur simulations show that utilizing common datastream permutations to test the coefficients of regression models can lead to extremely high type I (false-positive) error rates (> 30%) in the presence of non-random social structure. The magnitude of this problem is primarily dependent on the degree of non-randomness within the social structure and the intensity of sampling C_LIO_LIWe strongly recommend against utilizing datastream permutations to test regression models in animal social networks. We suggest that a potential solution may be found in regarding the problems of non-independence of network data and unreliability of observations as separate problems with distinct solutions. C_LI

Journal ArticleDOI
TL;DR: This work presents a new, reproducible pipeline in r that allows for relatively simple fitting of 24 different TPC models using nonlinear least squares (NLLS) regression and demonstrates how this pipeline can be combined with other packages in r to robustly and reproducibly fit multiple mathematical models to multiple TPC datasets at once.
Abstract: 1. The quantification of thermal performance curves (TPCs) for biological rates has many applications to problems such as predicting species9 responses to climate change. There is currently no widely used open-source pipeline to fit mathematical TPC models to data, which limits the transparency and reproducibility of the curve fitting process underlying applications of TPCs. 2. We present a new pipeline in R that currently allows for reproducible fitting of 24 different TPC models using non-linear least squares (NLLS) regression. The pipeline consists of two packages - rTPC and nls.multstart - that allow multiple start values for NLLS fitting and provides helper functions for setting start parameters. This pipeline overcomes previous problems that have made NLLS fitting and estimation of key parameters difficult or unreliable. 3. We demonstrate how rTPC and nls.multstart can be combined with other packages in R to robustly and reproducibly fit multiple models to multiple TPC datasets at once. In addition, we show how model selection or averaging, weighted model fitting, and bootstrapping can easily be implemented within the pipeline. 4. This new pipeline provides a flexible and reproducible approach that makes the challenging task of fitting multiple TPC models to data accessible to a wide range of users.

Journal ArticleDOI
TL;DR: It is found that automated detection could be achieved for a wider range of species and under a greater variety of environmental conditions than reported in previous reviews of automated and manual detection in drone-acquired imagery.
Abstract: Accurate detection of individual animals is integral to the management of vulnerable wildlife species, but often difficult and costly to achieve for species that occur over wide or inaccessible areas or engage in cryptic behaviours. There is a growing acceptance of the use of drones (also known as unmanned aerial vehicles, UAVs and remotely piloted aircraft systems, RPAS) to detect wildlife, largely because of the capacity for drones to rapidly cover large areas compared to ground survey methods. While drones can aid the capture of large amounts of imagery, detection requires either manual evaluation of the imagery or automated detection using machine learning algorithms. While manual evaluation of drone-acquired imagery is possible and sometimes necessary, the powerful combination of drones with automated detection of wildlife in this imagery is much faster and, in some cases, more accurate than using human observers. Despite the great potential of this emerging approach, most attention to date has been paid to the development of algorithms, and little is known about the constraints around successful detection (P. W. J. Baxter, and G. Hamilton, 2018, Ecosphere, 9, e02194). We reviewed studies that were conducted over the last 5 years in which wildlife species were detected automatically in drone-acquired imagery to understand how technological constraints, environmental conditions and ecological traits of target species impact detection with automated methods. From this review, we found that automated detection could be achieved for a wider range of species and under a greater variety of environmental conditions than reported in previous reviews of automated and manual detection in drone-acquired imagery. A high probability of automated detection could be achieved efficiently using fixed-wing platforms and RGB sensors for species that were large and occurred in open and homogeneous environments with little vegetation or variation in topography while infrared sensors and multirotor platforms were necessary to successfully detect small, elusive species in complex habitats. The insight gained in this review could allow conservation managers to use drones and machine learning algorithms more accurately and efficiently to conduct abundance data on vulnerable populations that is critical to their conservation.

Journal ArticleDOI
TL;DR: SlicerMorph provides users with modules to conveniently retrieve open‐access 3D models or import users own 3D volumes, to annotate 3D curve and patch‐based landmarks, generate landmark templates, conduct geometric morphometric analyses of 3D organismal form using both landmark‐driven and landmark‐free approaches, and create 3D animations from their results.


Journal ArticleDOI
TL;DR: In this article, an open access article under the terms of the Creat ive Commo ns Attri bution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
Abstract: This is an open access article under the terms of the Creat ive Commo ns Attri bution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. © 2021 The Authors. Methods in Ecology and Evolution published by John Wiley & Sons Ltd on behalf of British Ecological Society 1Graduate School of Oceanography, University of Rhode Island, Narragansett, RI, USA 2Department of Biology, University of New Mexico, Albuquerque, NM, USA


Journal ArticleDOI
TL;DR: This work discusses the freely available Flora Incognita app for Android, iOS and Harmony OS devices that allows users to interactively identify plant species and capture their observations, and developed deep learning algorithms, trained on an extensive repository of plant observations, classify plant images with yet unprecedented accuracy.

Journal ArticleDOI
TL;DR: The capabilities of the R package corHMM are expanded to handle n‐state and n‐character problems and provide users with a streamlined set of functions to create custom HMMs for any biological question of arbitrary complexity, finding that an HMM is an appropriate model when the degree of rate heterogeneity is moderate to high.
Abstract: O_LIHidden Markov models (HMM) have emerged as an important tool for understanding the evolution of characters that take on discrete states. Their flexibility and biological sensibility make them appealing for many phylogenetic comparative applications. C_LIO_LIPreviously available packages placed unnecessary limits on the number of observed and hidden states that can be considered when estimating transition rates and inferring ancestral states on a phylogeny. C_LIO_LITo address these issues, we expanded the capabilities of the R package corHMM to handle n-state and n-character problems and provide users with a streamlined set of functions to create custom HMMs for any biological question of arbitrary complexity. C_LIO_LIWe show that increasing the number of observed states increases the accuracy of ancestral state reconstruction. We also explore the conditions for when an HMM is most effective, finding that an HMM outperforms a Markov model when the degree of rate heterogeneity is moderate to high. C_LIO_LIFinally, we demonstrate the importance of these generalizations by reconstructing the morphology of the ancestral angiosperm flower. Exactly opposite to previous results, we find the most likely state to be a spiral perianth, spiral androecium, whorled gynoecium. The difference between our analysis and previous studies was that our modeling allowed for the correlated evolution of several flower characters. C_LI

Journal ArticleDOI
TL;DR: A standardisation framework adhering to existing data principles and involving the use of simple templates to create a data flow from manufacturers and researchers to compliant repositories is proposed, and will provide a starting point for broader efforts to establish interoperable bio‐logging data formats across all fields in animal ecology.
Abstract: Bio-logging data obtained by tagging animals are key to addressing global conservation challenges. However, the many thousands of existing bio-logging datasets are not easily discoverable, universally comparable, nor readily accessible through existing repositories and across platforms, slowing down ecological research and effective management. A set of universal standards is needed to ensure discoverability, interoperability and effective translation of bio-logging data into research and management recommendations.We propose a standardisation framework adhering to existing data principles (FAIR: Findable, Accessible, Interoperable and Reusable; and TRUST: Transparency, Responsibility, User focus, Sustainability and Technology) and involving the use of simple templates to create a data flow from manufacturers and researchers to compliant repositories, where automated procedures should be in place to prepare data availability into four standardised levels: (a) decoded raw data, (b) curated data, (c) interpolated data and (d) gridded data. Our framework allows for integration of simple tabular arrays (e.g. csv files) and creation of sharable and interoperable network Common Data Form (netCDF) files containing all the needed information for accuracy-of-use, rightful attribution (ensuring data providers keep ownership through the entire process) and data preservation security.We show the standardisation benefits for all stakeholders involved, and illustrate the application of our framework by focusing on marine animals and by providing examples of the workflow across all data levels, including filled templates and code to process data between levels, as well as templates to prepare netCDF files ready for sharing.Adoption of our framework will facilitate collection of Essential Ocean Variables (EOVs) in support of the Global Ocean Observing System (GOOS) and inter-governmental assessments (e.g. the World Ocean Assessment), and will provide a starting point for broader efforts to establish interoperable bio-logging data formats across all fields in animal ecology.

Journal ArticleDOI
TL;DR: It is shown that estimates based on polymorphic markers only are always biased by global sample size, and that when nucleotide sites with missing genotypes are included, observed and expected heterozygosity estimates diverge in proportion to the amount of missing data permitted at each site.
Abstract: Heterozygosity is a metric of genetic variability frequently used to inform the management of threatened taxa. Estimating observed and expected heterozygosities from genome-wide sequence data has become increasingly common, and these estimates are often derived directly from genotypes at single nucleotide polymorphism (SNP) markers. While many SNP markers can provide precise estimates of genetic processes, the results of ‘downstream’ analysis with these markers may depend heavily on ‘upstream’ filtering decisions. Here we explore the downstream consequences of sample size, rare allele filtering, missing data thresholds and known population structure on estimates of observed and expected heterozygosity using two reduced-representation sequencing datasets, one from the mosquito Aedes aegypti (ddRADseq) and the other from a threatened grasshopper, Keyacris scurra (DArTseq). We show that estimates based on polymorphic markers only (i.e. SNP heterozygosity) are always biased by global sample size (N), with smaller N producing larger estimates. By contrast, results are unbiased by sample size when calculations consider monomorphic as well as polymorphic sequence information (i.e. genome-wide or autosomal heterozygosity). SNP heterozygosity is also biased when differentiated populations are analysed together while autosomal heterozygosity remains unbiased. We also show that when nucleotide sites with missing genotypes are included, observed and expected heterozygosity estimates diverge in proportion to the amount of missing data permitted at each site. We make three recommendations for estimating genome-wide heterozygosity: (a) autosomal heterozygosity should be reported instead of (or in addition to) SNP heterozygosity; (b) sites with any missing data should be omitted and (c) populations should be analysed in independent runs. This should facilitate comparisons within and across studies and between observed and expected measures of heterozygosity.

Journal ArticleDOI
TL;DR: This paper describes phylogenetically aligned component analysis (PACA): a new ordination approach that aligns phenotypic data with phylogenetic signal and demonstrates with simulated and empirical examples that with PACA, it is possible to visualize the trend in phylogenetics signal in multivariate data spaces, irrespective of other signals in the data.
Abstract: Biological phenotypes are highly multivariate; containing sets of traits—or trait dimensions—that covary with one another to a greater or lesser degree (Adams & Collyer, 2019a; Goswami & Polly, 2010; Klingenberg, 2008; Olson & Miller, 1958). Increasingly, evolutionary biologists are characterizing phenotypes multivariately; evaluating trends in more than one trait simultaneously (e.g. Caetano & Harmon, 2019; Friedman et al., 2016; Price et al., 2010), or describing evolutionary changes in complex multidimensional phenotypes (e.g. Felice & Goswami, 2018; Martinez et al., 2018; Sherratt et al., 2016; Zelditch et al., 2017). This, in turn, has lead to the development of multivariate phylogenetic comparative methods, which facilitate the evaluation of macroevolutionary trends in multivariate phenotypes across the tree of life and in light of phylogenetic non-independence (e.g. Adams, 2014a, 2014b; Adams & Collyer, 2018b; Bartoszek et al., 2012; Bastide et al., 2018; Revell & Harmon, 2008). Received: 12 April 2020 | Accepted: 25 September 2020 DOI: 10.1111/2041-210X.13515

Journal ArticleDOI
TL;DR: It is concluded that, with adequate testing and evaluation in an ecological context, a machine learning model can generate labels for direct use in ecological analyses without the need for manual validation.
Abstract: Robin C. Whytock1,2 | Jędrzej Świeżewski3 | Joeri A. Zwerts4 | Tadeusz BaraSłupski4 | Aurélie Flore Koumba Pambo2 | Marek Rogala3 | Laila Bahaaeldin5 | Kelly Boekee6,7 | Stephanie Brittain8,9 | Anabelle W. Cardoso10 | Philipp Henschel11,12 | David Lehmann1,2 | Brice Momboua2 | Cisquet Kiebou Opepa13 | Christopher Orbell1,11 | Ross T. Pitman11 | Hugh S. Robinson11,14 | Katharine A. Abernethy1,12

Journal ArticleDOI
TL;DR: A new R package is presented—rasterdiv—to calculate heterogeneity indices based on remotely sensed data and an ecological application at the landscape scale is provided and its power in revealing potentially hidden heterogeneity patterns is demonstrated.
Abstract: Ecosystem heterogeneity has been widely recognized as a key ecological indicator of several ecological functions, diversity patterns and change, metapopulation dynamics, population connectivity or gene flow. In this paper, we present a new R package-rasterdiv-to calculate heterogeneity indices based on remotely sensed data. We also provide an ecological application at the landscape scale and demonstrate its power in revealing potentially hidden heterogeneity patterns. The rasterdiv package allows calculating multiple indices, robustly rooted in Information Theory, and based on reproducible open-source algorithms.




Journal ArticleDOI
TL;DR: The double permutation procedure provides one potential solution to issues arising from elevated type I and type II error rates when testing null hypotheses with social network data, and is suggested to be less likely to produce elevated error rates relative to using only node permutations, pre‐network permutations or node permutation with simple covariates.