scispace - formally typeset
Search or ask a question
Author

Ricardo Scachetti-Pereira

Bio: Ricardo Scachetti-Pereira is an academic researcher from University of Kansas. The author has contributed to research in topics: Environmental niche modelling & Ecological niche. The author has an hindex of 5, co-authored 5 publications receiving 6891 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: This work compared 16 modelling methods over 226 species from 6 regions of the world, creating the most comprehensive set of model comparisons to date and found that presence-only data were effective for modelling species' distributions for many species and regions.
Abstract: Prediction of species' distributions is central to diverse applications in ecology, evolution and conservation science. There is increasing electronic access to vast sets of occurrence records in museums and herbaria, yet little effective guidance on how best to use this information in the context of numerous approaches for modelling distributions. To meet this need, we compared 16 modelling methods over 226 species from 6 regions of the world, creating the most comprehensive set of model comparisons to date. We used presence-only data to fit models, and independent presence-absence data to evaluate the predictions. Along with well-established modelling methods such as generalised additive models and GARP and BIOCLIM, we explored methods that either have been developed recently or have rarely been applied to modelling species' distributions. These include machine-learning methods and community models, both of which have features that may make them particularly well suited to noisy or sparse information, as is typical of species' occurrence data. Presence-only data were effective for modelling species' distributions for many species and regions. The novel methods consistently outperformed more established methods. The results of our analysis are promising for the use of data from museums and herbaria, especially as methods suited to the noise inherent in such data improve.

7,589 citations

Journal ArticleDOI
TL;DR: The technique of ecological niche modeling using the genetic algorithm for rule-set prediction (GARP) to predict the potential distributions of these two species in Japan, finding that the predictions were statistically significant for both species.
Abstract: Largemouth bass Micropterus salmoides and smallmouth bass M. dolomieu have been introduced into freshwater habitats in Japan, with potentially serious consequences for native fish populations. In this paper we apply the technique of ecological niche modeling using the genetic algorithm for rule-set prediction (GARP) to predict the potential distributions of these two species in Japan. This algorithm constructs a niche model based on point occurrence records and ecological coverages. The model can be visualized in geographic space, yielding a prediction of potential geographic range. The model can then be tested by determining how well independent point occurrence data are predicted according to the criteria of sensitivity and specificity provided by receiver–operator curve analysis. We ground-truthed GARP's ability to forecast the geographic occurrence of each species in its native range. The predictions were statistically significant for both species (P < 0.001). We projected the niche models on...

108 citations

Journal ArticleDOI
TL;DR: In this paper, the authors developed predictive models of ecological and spatial distributions of capybaras (Hydrochoerus hydrochaeris) using ecological niche modeling, and found that most occurrences of the animals were in flat areas with water bodies surrounded by sugarcane and pasture.
Abstract: Southeastern Brazil has seen dramatic landscape modifications in recent decades, due to expansion of agriculture and urban areas; these changes have influenced the distribution and abundance of vertebrates. We developed predictive models of ecological and spatial distributions of capybaras (Hydrochoerus hydrochaeris) using ecological niche modeling. Most occurrences of capybaras were in flat areas with water bodies surrounded by sugarcane and pasture. More than 75% of the Piracicaba River basin was estimated as potentially habitable by capybara. The models had low omission error (2.3‐3.4%), but higher commission error (91.0‐98.5%); these ‘‘model failures’’ seem to be more related to local habitat characteristics than to spatial ones. The potential distribution of capybaras in the basin is associated with anthropogenic habitats, particularly with intensive land use for agriculture.

31 citations

Journal ArticleDOI
TL;DR: The potential of Homalodisca coagulata to invade South America is a question of economic importance, given its potential impact as a disease vector for several crops as mentioned in this paper.
Abstract: The potential of Homalodisca coagulata to invade South America is a question of economic importance, given its potential impact as a disease vector for several crops. We developed ecological niche models for the species on its native geographic distribution in the southeastern United States; we tested the predictivity of the models both on the native distributional area and via projections to California, where the species has long been present as an invasive species. In both cases, tests indicated high statistical significance of predictions. Projection of models to South America indicated little possibility of invasion of southeastern Brazil, where citrus diseases were of concern. However, all models agree in predict-ing great risk of establishment in the wine-growing regions of northern Argentina and extreme southern Brazil; great precaution is thus to be recommended when any movements of bio-materials are made from infected areas to this region.

23 citations

Journal ArticleDOI
01 May 2006
TL;DR: This work investigated the potential geographic range of the invasive paleotropical weed, smooth crotalaria, in protected natural areas across Brazil and found it appears more likely to occur in open and highly fragmented areas than in extensive closed forests.
Abstract: Alien weed species rank among the most important threats to conservation of biodiversity, making understanding the extent to which protected natural areas are vulnerable to invasion by weeds pivotal in long-term maintenance and conservation of biodiversity. We investigated the potential geographic range of the invasive paleotropical weed, smooth crotalaria, in protected natural areas across Brazil. The ecological niche dimensions of smooth crotalaria in Africa (its putative original distribution) were modeled using a genetic algorithm. Models for the native range and their projections to South America showed good predictive ability when challenged with independent occurrence data. All Brazilian protected natural areas were predicted as highly vulnerable to invasion by this species. However, smooth crotalaria appears more likely to occur in open (savanna-like vegetation, such as cerrado and pantanal) and highly fragmented (Atlantic forest) areas than in extensive closed forests (Amazon). Managemen...

16 citations


Cited by
More filters
Journal ArticleDOI
25 Apr 2013-Nature
TL;DR: These new risk maps and infection estimates provide novel insights into the global, regional and national public health burden imposed by dengue and will help to guide improvements in disease control strategies using vaccine, drug and vector control methods, and in their economic evaluation.
Abstract: Dengue is a systemic viral infection transmitted between humans by Aedes mosquitoes. For some patients, dengue is a life-threatening illness. There are currently no licensed vaccines or specific therapeutics, and substantial vector control efforts have not stopped its rapid emergence and global spread. The contemporary worldwide distribution of the risk of dengue virus infection and its public health burden are poorly known. Here we undertake an exhaustive assembly of known records of dengue occurrence worldwide, and use a formal modelling framework to map the global distribution of dengue risk. We then pair the resulting risk map with detailed longitudinal information from dengue cohort studies and population surfaces to infer the public health burden of dengue in 2010. We predict dengue to be ubiquitous throughout the tropics, with local spatial variations in risk influenced strongly by rainfall, temperature and the degree of urbanization. Using cartographic approaches, we estimate there to be 390 million (95% credible interval 284-528) dengue infections per year, of which 96 million (67-136) manifest apparently (any level of disease severity). This infection total is more than three times the dengue burden estimate of the World Health Organization. Stratification of our estimates by country allows comparison with national dengue reporting, after taking into account the probability of an apparent infection being formally reported. The most notable differences are discussed. These new risk maps and infection estimates provide novel insights into the global, regional and national public health burden imposed by dengue. We anticipate that they will provide a starting point for a wider discussion about the global impact of this disease and will help to guide improvements in disease control strategies using vaccine, drug and vector control methods, and in their economic evaluation.

7,238 citations

Journal ArticleDOI
TL;DR: It was found that methods specifically designed for collinearity, such as latent variable methods and tree based models, did not outperform the traditional GLM and threshold-based pre-selection and the value of GLM in combination with penalised methods and thresholds when omitted variables are considered in the final interpretation.
Abstract: Collinearity refers to the non independence of predictor variables, usually in a regression-type analysis. It is a common feature of any descriptive ecological data set and can be a problem for parameter estimation because it inflates the variance of regression parameters and hence potentially leads to the wrong identification of relevant predictors in a statistical model. Collinearity is a severe problem when a model is trained on data from one region or time, and predicted to another with a different or unknown structure of collinearity. To demonstrate the reach of the problem of collinearity in ecology, we show how relationships among predictors differ between biomes, change over spatial scales and through time. Across disciplines, different approaches to addressing collinearity problems have been developed, ranging from clustering of predictors, threshold-based pre-selection, through latent variable methods, to shrinkage and regularisation. Using simulated data with five predictor-response relationships of increasing complexity and eight levels of collinearity we compared ways to address collinearity with standard multiple regression and machine-learning approaches. We assessed the performance of each approach by testing its impact on prediction to new data. In the extreme, we tested whether the methods were able to identify the true underlying relationship in a training dataset with strong collinearity by evaluating its performance on a test dataset without any collinearity. We found that methods specifically designed for collinearity, such as latent variable methods and tree based models, did not outperform the traditional GLM and threshold-based pre-selection. Our results highlight the value of GLM in combination with penalised methods (particularly ridge) and threshold-based pre-selection when omitted variables are considered in the final interpretation. However, all approaches tested yielded degraded predictions under change in collinearity structure and the ‘folk lore’-thresholds of correlation coefficients between predictor variables of |r| >0.7 was an appropriate indicator for when collinearity begins to severely distort model estimation and subsequent prediction. The use of ecological understanding of the system in pre-analysis variable selection and the choice of the least sensitive statistical approaches reduce the problems of collinearity, but cannot ultimately solve them.

6,199 citations

Journal ArticleDOI
TL;DR: This paper presents a tuning method that uses presence-only data for parameter tuning, and introduces several concepts that improve the predictive accuracy and running time of Maxent and describes a new logistic output format that gives an estimate of probability of presence.
Abstract: Accurate modeling of geographic distributions of species is crucial to various applications in ecology and conservation. The best performing techniques often require some parameter tuning, which may be prohibitively time-consuming to do separately for each species, or unreliable for small or biased datasets. Additionally, even with the abundance of good quality data, users interested in the application of species models need not have the statistical knowledge required for detailed tuning. In such cases, it is desirable to use "default settings", tuned and validated on diverse datasets. Maxent is a recently introduced modeling technique, achieving high predictive accuracy and enjoying several additional attractive properties. The performance of Maxent is influenced by a moderate number of parameters. The first contribution of this paper is the empirical tuning of these parameters. Since many datasets lack information about species absence, we present a tuning method that uses presence-only data. We evaluate our method on independently collected high-quality presence-absence data. In addition to tuning, we introduce several concepts that improve the predictive accuracy and running time of Maxent. We introduce "hinge features" that model more complex relationships in the training data; we describe a new logistic output format that gives an estimate of probability of presence; finally we explore "background sampling" strategies that cope with sample selection bias and decrease model-building time. Our evaluation, based on a diverse dataset of 226 species from 6 regions, shows: 1) default settings tuned on presence-only data achieve performance which is almost as good as if they had been tuned on the evaluation data itself; 2) hinge features substantially improve model performance; 3) logistic output improves model calibration, so that large differences in output values correspond better to large differences in suitability; 4) "target-group" background sampling can give much better predictive performance than random background sampling; 5) random background sampling results in a dramatic decrease in running time, with no decrease in model performance.

5,314 citations

Journal ArticleDOI
TL;DR: Species distribution models (SDMs) as mentioned in this paper are numerical tools that combine observations of species occurrence or abundance with environmental estimates, and are used to gain ecological and evolutionary insights and to predict distributions across landscapes, sometimes requiring extrapolation in space and time.
Abstract: Species distribution models (SDMs) are numerical tools that combine observations of species occurrence or abundance with environmental estimates. They are used to gain ecological and evolutionary insights and to predict distributions across landscapes, sometimes requiring extrapolation in space and time. SDMs are now widely used across terrestrial, freshwater, and marine realms. Differences in methods between disciplines reflect both differences in species mobility and in “established use.” Model realism and robustness is influenced by selection of relevant predictors and modeling method, consideration of scale, how the interplay between environmental and geographic factors is handled, and the extent of extrapolation. Current linkages between SDM practice and ecological theory are often weak, hindering progress. Remaining challenges include: improvement of methods for modeling presence-only data and for model selection and evaluation; accounting for biotic interactions; and assessing model uncertainty.

5,076 citations

Journal ArticleDOI
TL;DR: This study provides a working guide to boosted regression trees (BRT), an ensemble method for fitting statistical models that differs fundamentally from conventional techniques that aim to fit a single parsimonious model.
Abstract: Summary 1 Ecologists use statistical models for both explanation and prediction, and need techniques that are flexible enough to express typical features of their data, such as nonlinearities and interactions 2 This study provides a working guide to boosted regression trees (BRT), an ensemble method for fitting statistical models that differs fundamentally from conventional techniques that aim to fit a single parsimonious model Boosted regression trees combine the strengths of two algorithms: regression trees (models that relate a response to their predictors by recursive binary splits) and boosting (an adaptive method for combining many simple models to give improved predictive performance) The final BRT model can be understood as an additive regression model in which individual terms are simple trees, fitted in a forward, stagewise fashion 3 Boosted regression trees incorporate important advantages of tree-based methods, handling different types of predictor variables and accommodating missing data They have no need for prior data transformation or elimination of outliers, can fit complex nonlinear relationships, and automatically handle interaction effects between predictors Fitting multiple trees in BRT overcomes the biggest drawback of single tree models: their relatively poor predictive performance Although BRT models are complex, they can be summarized in ways that give powerful ecological insight, and their predictive performance is superior to most traditional modelling methods 4 The unique features of BRT raise a number of practical issues in model fitting We demonstrate the practicalities and advantages of using BRT through a distributional analysis of the short-finned eel ( Anguilla australis Richardson), a native freshwater fish of New Zealand We use a data set of over 13 000 sites to illustrate effects of several settings, and then fit and interpret a model using a subset of the data We provide code and a tutorial to enable the wider use of BRT by ecologists

4,787 citations