scispace - formally typeset
Search or ask a question

Showing papers by "Fredrik Ronquist published in 2020"


Journal ArticleDOI
TL;DR: The history, organisation, methodology and logistics of the SMTP is described, focusing on the rationale for the decisions taken and the lessons learned along the way, as well as characterising the size and taxonomic composition of theSMTP material.
Abstract: The Swedish Malaise Trap Project (SMTP) is one of the most ambitious insect inventories ever attempted. The project was designed to target poorly known insect groups across a diverse range of habitats in Sweden. The field campaign involved the deployment of 73 Malaise traps at 55 localities across the country for three years (2003-2006). Over the past 15 years, the collected material has been hand sorted by trained technicians into over 300 taxonomic fractions suitable for expert attention. The resulting collection is a tremendous asset for entomologists around the world, especially as we now face a desperate need for baseline data to evaluate phenomena like insect decline and climate change. Here, we describe the history, organisation, methodology and logistics of the SMTP, focusing on the rationale for the decisions taken and the lessons learned along the way. The SMTP represents one of the early instances of community science applied to large-scale inventory work, with a heavy reliance on volunteers in both the field and the laboratory. We give estimates of both staff effort and volunteer effort involved. The project has been funded by the Swedish Taxonomy Initiative; in total, the inventory has cost less than 30 million SEK (approximately 3.1 million USD). Based on a subset of the samples, we characterise the size and taxonomic composition of the SMTP material. Several different extrapolation methods suggest that the material comprises around 20 million specimens in total. The material is dominated by Diptera (75% of the specimens) and Hymenoptera (15% of specimens). Amongst the Diptera, the dominant groups are Chironomidae (37% of specimens), Sciaridae (15%), Phoridae (13%), Cecidomyiidae (9.5%) and Mycetophilidae (9.4%). Within Hymenoptera, the major groups are Ichneumonidae (44% of specimens), Diaprioidea (19%), Braconidae (9.6%), Platygastroidea (8.5%) and Chalcidoidea (7.9%). The taxonomic composition varies with latitude and season. Several Diptera and Hymenoptera groups are more common in non-summer samples (collected from September to April) and in the North, while others show the opposite pattern. About 1% of the total material has been processed and identified by experts so far. This material represents over 4,000 species. One third of these had not been recorded from Sweden before and almost 700 of them are new to science. These results reveal the large amounts of taxonomic work still needed on Palaearctic insect faunas. Based on the SMTP experiences, we discuss aspects of planning and conducting future large-scale insect inventory projects using mainly traditional approaches in relation to more recent approaches that rely on molecular techniques.

62 citations


Journal ArticleDOI
04 Mar 2020-PLOS ONE
TL;DR: The size and composition of the Swedish insect fauna, thought to represent roughly half of the diversity of multicellular life in one of the largest European countries, is reported on, and it is suggested that it comprises around 33,000 species.
Abstract: Despite more than 250 years of taxonomic research, we still have only a vague idea about the true size and composition of the faunas and floras of the planet. Many biodiversity inventories provide limited insight because they focus on a small taxonomic subsample or a tiny geographic area. Here, we report on the size and composition of the Swedish insect fauna, thought to represent roughly half of the diversity of multicellular life in one of the largest European countries. Our results are based on more than a decade of data from the Swedish Taxonomy Initiative and its massive inventory of the country's insect fauna, the Swedish Malaise Trap Project The fauna is considered one of the best known in the world, but the initiative has nevertheless revealed a surprising amount of hidden diversity: more than 3,000 new species (301 new to science) have been documented so far. Here, we use three independent methods to analyze the true size and composition of the fauna at the family or subfamily level: (1) assessments by experts who have been working on the most poorly known groups in the fauna; (2) estimates based on the proportion of new species discovered in the Malaise trap inventory; and (3) extrapolations based on species abundance and incidence data from the inventory. For the last method, we develop a new estimator, the combined non-parametric estimator, which we show is less sensitive to poor coverage of the species pool than other popular estimators. The three methods converge on similar estimates of the size and composition of the fauna, suggesting that it comprises around 33,000 species. Of those, 8,600 (26%) were unknown at the start of the inventory and 5,000 (15%) still await discovery. We analyze the taxonomic and ecological composition of the estimated fauna, and show that most of the new species belong to Hymenoptera and Diptera groups that are decomposers or parasitoids. Thus, current knowledge of the Swedish insect fauna is strongly biased taxonomically and ecologically, and we show that similar but even stronger biases have distorted our understanding of the fauna in the past. We analyze latitudinal gradients in the size and composition of known European insect faunas and show that several of the patterns contradict the Swedish data, presumably due to similar knowledge biases. Addressing these biases is critical in understanding insect biomes and the ecosystem services they provide. Our results emphasize the need to broaden the taxonomic scope of current insect monitoring efforts, a task that is all the more urgent as recent studies indicate a possible worldwide decline in insect faunas.

26 citations


Journal ArticleDOI
TL;DR: A Bayesian approach for inferring coevolutionary history based on a model accommodating the complexities of the host ranges of parasites, which suggests that host ranges often include more than one host and evolve via gains and losses of hosts rather than through cospeciation alone.
Abstract: Intimate ecological interactions, such as those between parasites and their hosts, may persist over long time spans, coupling the evolutionary histories of the lineages involved. Most methods that reconstruct the coevolutionary history of such interactions make the simplifying assumption that parasites have a single host. Many methods also focus on congruence between host and parasite phylogenies, using cospeciation as the null model. However, there is an increasing body of evidence suggesting that the host ranges of parasites are more complex: that host ranges often include more than one host and evolve via gains and losses of hosts rather than through cospeciation alone. Here, we develop a Bayesian approach for inferring coevolutionary history based on a model accommodating these complexities. Specifically, a parasite is assumed to have a host repertoire, which includes both potential hosts and one or more actual hosts. Over time, potential hosts can be added or lost, and potential hosts can develop into actual hosts or vice versa. Thus, host colonization is modeled as a two-step process that may potentially be influenced by host relatedness. We first explore the statistical behavior of our model by simulating evolution of host-parasite interactions under a range of parameter values. We then use our approach, implemented in the program RevBayes, to infer the coevolutionary history between 34 Nymphalini butterfly species and 25 angiosperm families. Our analysis suggests that host relatedness among angiosperm families influences how easily Nymphalini lineages gain new hosts. [Ancestral hosts; coevolution; herbivorous insects; probabilistic modeling.].

25 citations


Journal ArticleDOI
TL;DR: This work introduces a new class of moves, which propose trees based on their parsimony scores, and shows that tree moves based on quick and dirty estimates of the posterior probability can significantly outperform standard moves.
Abstract: Sampling across tree space is one of the major challenges in Bayesian phylogenetic inference using Markov chain Monte Carlo (MCMC) algorithms. Standard MCMC tree moves consider small random perturbations of the topology, and select from candidate trees at random or based on the distance between the old and new topologies. MCMC algorithms using such moves tend to get trapped in tree space, making them slow in finding the globally most probable trees (known as "convergence") and in estimating the correct proportions of the different types of them (known as "mixing"). Here, we introduce a new class of moves, which propose trees based on their parsimony scores. The proposal distribution derived from the parsimony scores is a quickly computable albeit rough approximation of the conditional posterior distribution over candidate trees. We demonstrate with simulations that parsimony-guided moves correctly sample the uniform distribution of topologies from the prior. We then evaluate their performance against standard moves using six challenging empirical data sets, for which we were able to obtain accurate reference estimates of the posterior using long MCMC runs, a mix of topology proposals, and Metropolis coupling. On these data sets, ranging in size from 357 to 934 taxa and from 1740 to 5681 sites, we find that single chains using parsimony-guided moves usually converge an order of magnitude faster than chains using standard moves. They also exhibit better mixing, that is, they cover the most probable trees more quickly. Our results show that tree moves based on quick and dirty estimates of the posterior probability can significantly outperform standard moves. Future research will have to show to what extent the performance of such moves can be improved further by finding better ways of approximating the posterior probability, taking the trade-off between accuracy and speed into account. [Bayesian phylogenetic inference; MCMC; parsimony; tree proposal.].

19 citations


Posted ContentDOI
06 Mar 2020-bioRxiv
TL;DR: While higher ethanol concentrations positively affect long-term DNA preservation, there is a clear trade-off between preserving insects for morphological examination and genetic analysis, and that DNA preserves less well at lower ethanol concentrations when stored at room temperature for an extended period.
Abstract: 1. Traditionally, insects collected for scientific purposes have been dried and pinned, or preserved in 70% ethanol. Both methods preserve taxonomically informative exoskeletal structures well. Hig ...

17 citations


Journal ArticleDOI
TL;DR: The SMTP dataset is unique in that it contains a large proportion of data on previously poorly-known taxa in the Diptera and Hymenoptera, and will be published continuously.
Abstract: Background Despite Sweden's strong entomological tradition, large portions of its insect fauna remain poorly known. As part of the Swedish Taxonomy Initiative, launched in 2002 to document all multi-cellular species occurring in the country, the first taxonomically-broad inventory of the country's insect fauna was initiated, the Swedish Malaise Trap Project (SMTP). In total, 73 Malaise traps were deployed at 55 localities representing a wide range of habitats across the country. Most traps were run continuously from 2003 to 2006 or for a substantial part of that time period. The total catch is estimated to contain 20 million insects, distributed over 1,919 samples (Karlsson et al. 2020). The samples have been sorted into more than 300 taxonomic units, which are made available for expert identification. Thus far, more than 100 taxonomists have been involved in identifying the sorted material, recording the presence of 4,000 species. One third of these had not been recorded from Sweden before and 700 have tentatively been identified as new to science. New information Here, we describe the SMTP dataset, published through the Global Biodiversity Information Facility (GBIF). Data on the sorted material are available in the "SMTP Collection Inventory" dataset. It currently includes more than 130,000 records of taxonomically-sorted samples. Data on the identified material are published using the Darwin Core standard for sample-based data. That information is divided up into group-specific datasets, as the sample set processed for each group is different and in most cases non-overlapping. The current data are divided into 79 taxonomic datasets, largely corresponding to taxonomic sorting fractions. The orders Diptera and Hymenoptera together comprise about 90% of the specimens in the material and these orders are mainly sorted to family or subfamily. The remaining insect taxa are mostly sorted to the order level. In total, the 79 datasets currently available comprise around 165,000 specimens, that is, about 1% of the total catch. However, the data are now accumulating rapidly and will be published continuously. The SMTP dataset is unique in that it contains a large proportion of data on previously poorly-known taxa in the Diptera and Hymenoptera.

14 citations


Posted ContentDOI
14 Oct 2020-bioRxiv
TL;DR: In this paper, the authors found that high ethanol concentrations indeed induce brittleness in insects, but the magnitude and nature of the effect varied strikingly among species and the number of appendages (legs, wings, antennae, heads) that had lost.
Abstract: 1. Traditionally, insects collected for scientific purposes have been dried and pinned, or preserved in 70 % ethanol. Both methods preserve taxonomically informative exoskeletal structures well but are suboptimal for preserving DNA. Highly concentrated ethanol (95 – 100 %), preferred as a DNA preservative, has generally been assumed to make specimens brittle and prone to breaking. However, systematic studies on the correlation between ethanol concentration and specimen preservation are lacking. 2. We tested how preservative ethanol concentration in combination with different sample handling regimes affect the integrity of seven insect species representing four orders, and differing substantially in the level of sclerotization. After preservation and treatments (various levels of disturbance), we counted the number of appendages (legs, wings, antennae, heads) that specimens had lost. Additionally, we assessed the preservation of DNA after long-term storage by comparing the ratio of PCR amplicon copy numbers to an added artificial standard. 3. We found that high ethanol concentrations indeed induce brittleness in insects. However, the magnitude and nature of the effect varied strikingly among species. In general, ethanol concentrations at or above 90 % made the insects more brittle, but for species with robust, thicker exoskeletons, this did not translate to an increased loss of appendages. Neither freezing nor drying the insects after immersion in ethanol had a negative effect on the retention of appendages. We also found that DNA preserves less well at lower ethanol concentrations when stored at room temperature for an extended period. However, the magnitude of the effect varies among species; the concentrations at which the number of COI amplicon copies relative to the standard was significantly decreased compared to 95 % ethanol ranged from 90 % to as low as 50 %. 4. While higher ethanol concentrations positively affect long-term DNA preservation, there is a clear trade-off between preserving insects for morphological examination and genetic analysis. The optimal ethanol concentration for the latter is detrimental for the former, and vice versa. These trade-offs need to be considered in large insect biodiversity surveys and other projects aiming to combine molecular work with traditional morphology-based characterization of collected specimens.

3 citations


Posted ContentDOI
10 Dec 2020-bioRxiv
TL;DR: In this article, the authors show that universal probabilistic programming languages (PPLs) solve the expressivity problem of phylogenetic analysis, while still supporting automated generation of efficient inference algorithms.
Abstract: Statistical phylogenetic analysis currently relies on complex, dedicated software packages, making it difficult for evolutionary biologists to explore new models and inference strategies. Recent years have seen more generic solutions based on probabilistic graphical models, but this formalism can only partly express phylogenetic problems. Here we show that universal probabilistic programming languages (PPLs) solve the expressivity problem, while still supporting automated generation of efficient inference algorithms. To prove the latter point, we develop automated generation of sequential Monte Carlo (SMC) algorithms for PPL descriptions of arbitrary biological diversification (birth-death) models. SMC is a new inference strategy for these problems, supporting both parameter inference and efficient estimation of Bayes factors that are used in model testing. We take advantage of this in automatically generating SMC algorithms for several recent diversification models that have been difficult or impossible to tackle previously. Finally, applying these algorithms to 40 bird phylogenies, we show that models with slowing diversification, constant turnover and many small shifts generally explain the data best. Our work opens up several related problem domains to PPL approaches, and shows that few hurdles remain before these techniques can be effectively applied to the full range of phylogenetic models.

3 citations


Posted ContentDOI
01 Dec 2020-bioRxiv
TL;DR: In this article, the authors develop automated generation of sequential Monte Carlo (SMC) algorithms for PPL descriptions of arbitrary biological diversification (birth-death) models, supporting both parameter inference and efficient Bayesian model testing.
Abstract: Statistical phylogenetic analysis currently relies on complex, dedicated software packages, making it difficult for evolutionary biologists to explore new models and inference strategies Recent years have seen more generic solutions based on probabilistic graphical models, but this formalism can only partly express phylogenetic problems Here we show that universal probabilistic programming languages (PPLs) solve the modeling language expressivity problem, while still supporting automated generation of efficient inference algorithms To prove the latter point, we develop automated generation of sequential Monte Carlo (SMC) algorithms for PPL descriptions of arbitrary biological diversification (birth-death) models SMC is a new inference strategy for these problems, supporting both parameter inference and efficient Bayesian model testing We then automatically generate SMC algorithms for several recent diversification models that have been difficult or impossible to tackle previously Finally, applying these algorithms to 40 bird phylogenies, we show that models with slowing diversification, constant turnover and many small shifts generally explain the data best Our work opens up several related problem domains to PPL approaches, and shows that few hurdles remain before these techniques can be effectively applied to the full range of phylogenetic models

3 citations


Posted ContentDOI
15 Oct 2020-bioRxiv
TL;DR: This work develops automated generation of sequential Monte Carlo algorithms for PPL descriptions of arbitrary biological diversification (birth-death) models, and shows that few hurdles remain before these techniques can be effectively applied to the full range of phylogenetic models.
Abstract: Statistical phylogenetic analysis currently relies on complex, dedicated software packages, making it difficult for evolutionary biologists to explore new models and inference strategies. Recent years have seen more generic solutions based on probabilistic graphical models, but this formalism can only partly express phylogenetic problems. Here we show that universal probabilistic programming languages (PPLs) solve the modeling language expressivity problem, while still supporting automated generation of efficient inference algorithms. To prove the latter point, we develop automated generation of sequential Monte Carlo (SMC) algorithms for PPL descriptions of arbitrary biological diversification (birth-death) models. SMC is a new inference strategy for these problems, supporting both parameter inference and efficient Bayesian model testing. We then automatically generate SMC algorithms for several recent diversification models that have been difficult or impossible to tackle previously. Finally, applying these algorithms to 40 bird phylogenies, we show that models with slowing diversification, constant turnover and many small shifts generally explain the data best. Our work opens up several related problem domains to PPL approaches, and shows that few hurdles remain before these techniques can be effectively applied to the full range of phylogenetic models.

1 citations