scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Maximum entropy modeling of species geographic distributions

TL;DR: In this paper, the use of the maximum entropy method (Maxent) for modeling species geographic distributions with presence-only data was introduced, which is a general-purpose machine learning method with a simple and precise mathematical formulation.
About: This article is published in Ecological Modelling.The article was published on 2006-01-25 and is currently open access. It has received 13120 citations till now. The article focuses on the topics: Environmental niche modelling & Species distribution.
Citations
More filters
Journal ArticleDOI
TL;DR: This work compared 16 modelling methods over 226 species from 6 regions of the world, creating the most comprehensive set of model comparisons to date and found that presence-only data were effective for modelling species' distributions for many species and regions.
Abstract: Prediction of species' distributions is central to diverse applications in ecology, evolution and conservation science. There is increasing electronic access to vast sets of occurrence records in museums and herbaria, yet little effective guidance on how best to use this information in the context of numerous approaches for modelling distributions. To meet this need, we compared 16 modelling methods over 226 species from 6 regions of the world, creating the most comprehensive set of model comparisons to date. We used presence-only data to fit models, and independent presence-absence data to evaluate the predictions. Along with well-established modelling methods such as generalised additive models and GARP and BIOCLIM, we explored methods that either have been developed recently or have rarely been applied to modelling species' distributions. These include machine-learning methods and community models, both of which have features that may make them particularly well suited to noisy or sparse information, as is typical of species' occurrence data. Presence-only data were effective for modelling species' distributions for many species and regions. The novel methods consistently outperformed more established methods. The results of our analysis are promising for the use of data from museums and herbaria, especially as methods suited to the noise inherent in such data improve.

7,589 citations

Journal ArticleDOI
TL;DR: An overview of recent advances in species distribution models, and new avenues for incorporating species migration, population dynamics, biotic interactions and community ecology into SDMs at multiple spatial scales are suggested.
Abstract: In the last two decades, interest in species distribution models (SDMs) of plants and animals has grown dramatically. Recent advances in SDMs allow us to potentially forecast anthropogenic effects on patterns of biodiversity at different spatial scales. However, some limitations still preclude the use of SDMs in many theoretical and practical applications. Here, we provide an overview of recent advances in this field, discuss the ecological principles and assumptions underpinning SDMs, and highlight critical limitations and decisions inherent in the construction and evaluation of SDMs. Particular emphasis is given to the use of SDMs for the assessment of climate change impacts and conservation management issues. We suggest new avenues for incorporating species migration, population dynamics, biotic interactions and community ecology into SDMs at multiple spatial scales. Addressing all these issues requires a better integration of SDMs with ecological theory.

5,620 citations

Journal ArticleDOI
TL;DR: This paper presents a tuning method that uses presence-only data for parameter tuning, and introduces several concepts that improve the predictive accuracy and running time of Maxent and describes a new logistic output format that gives an estimate of probability of presence.
Abstract: Accurate modeling of geographic distributions of species is crucial to various applications in ecology and conservation. The best performing techniques often require some parameter tuning, which may be prohibitively time-consuming to do separately for each species, or unreliable for small or biased datasets. Additionally, even with the abundance of good quality data, users interested in the application of species models need not have the statistical knowledge required for detailed tuning. In such cases, it is desirable to use "default settings", tuned and validated on diverse datasets. Maxent is a recently introduced modeling technique, achieving high predictive accuracy and enjoying several additional attractive properties. The performance of Maxent is influenced by a moderate number of parameters. The first contribution of this paper is the empirical tuning of these parameters. Since many datasets lack information about species absence, we present a tuning method that uses presence-only data. We evaluate our method on independently collected high-quality presence-absence data. In addition to tuning, we introduce several concepts that improve the predictive accuracy and running time of Maxent. We introduce "hinge features" that model more complex relationships in the training data; we describe a new logistic output format that gives an estimate of probability of presence; finally we explore "background sampling" strategies that cope with sample selection bias and decrease model-building time. Our evaluation, based on a diverse dataset of 226 species from 6 regions, shows: 1) default settings tuned on presence-only data achieve performance which is almost as good as if they had been tuned on the evaluation data itself; 2) hinge features substantially improve model performance; 3) logistic output improves model calibration, so that large differences in output values correspond better to large differences in suitability; 4) "target-group" background sampling can give much better predictive performance than random background sampling; 5) random background sampling results in a dramatic decrease in running time, with no decrease in model performance.

5,314 citations


Cites methods from "Maximum entropy modeling of species..."

  • ...Modeling of species distributions with Maxent: new extensions and a comprehensive evaluation...

    [...]

Journal ArticleDOI
TL;DR: Species distribution models (SDMs) as mentioned in this paper are numerical tools that combine observations of species occurrence or abundance with environmental estimates, and are used to gain ecological and evolutionary insights and to predict distributions across landscapes, sometimes requiring extrapolation in space and time.
Abstract: Species distribution models (SDMs) are numerical tools that combine observations of species occurrence or abundance with environmental estimates. They are used to gain ecological and evolutionary insights and to predict distributions across landscapes, sometimes requiring extrapolation in space and time. SDMs are now widely used across terrestrial, freshwater, and marine realms. Differences in methods between disciplines reflect both differences in species mobility and in “established use.” Model realism and robustness is influenced by selection of relevant predictors and modeling method, consideration of scale, how the interplay between environmental and geographic factors is handled, and the extent of extrapolation. Current linkages between SDM practice and ecological theory are often weak, hindering progress. Remaining challenges include: improvement of methods for modeling presence-only data and for model selection and evaluation; accounting for biotic interactions; and assessing model uncertainty.

5,076 citations


Cites background or methods from "Maximum entropy modeling of species..."

  • ...…Frescino 2002), classification and regression trees and ensembles of trees (random forests: Prasad et al. 2006; boosted regression trees: Elith et al. 2008), genetic algorithms (Stockwell & Peters 1999), support vector machines (Drake et al. 2006), and maximum entropy models (Phillips et al. 2006)....

    [...]

  • ...Where analytical methods were once restricted to envelopes and distance measures, comparison of presence records with background or pseudoabsence points is now common (e.g., using GARP, ENFA, MaxEnt, and regression methods)....

    [...]

  • ...In machine learning these ideas of model selection and tuning are termed “regularization,” i.e., making the fitted surface more regular or smooth by controlling overfitting (e.g., used in MaxEnt, Phillips et al. 2006)....

    [...]

  • ...The key structural features of GLMs (non-normal error distributions, additive terms, nonlinear fitted functions) continue to be useful and are part of many current methods including RSFs (Manly et al. 2002) and maximum entropy models (MaxEnt; Phillips et al. 2006)....

    [...]

Journal ArticleDOI
TL;DR: A new statistical explanation of MaxEnt is described, showing that the model minimizes the relative entropy between two probability densities defined in covariate space, which is likely to be a more accessible way to understand the model than previous ones that rely on machine learning concepts.
Abstract: MaxEnt is a program for modelling species distributions from presence-only species records. This paper is written for ecologists and describes the MaxEnt model from a statistical perspective, making explicit links between the structure of the model, decisions required in producing a modelled distribution, and knowledge about the species and the data that might affect those decisions. To begin we discuss the characteristics of presence-only data, highlighting implications for modelling distributions. We particularly focus on the problems of sample bias and lack of information on species prevalence. The keystone of the paper is a new statistical explanation of MaxEnt which shows that the model minimizes the relative entropy between two probability densities (one estimated from the presence data and one, from the landscape) defined in covariate space. For many users, this viewpoint is likely to be a more accessible way to understand the model than previous ones that rely on machine learning concepts. We then step through a detailed explanation of MaxEnt describing key components (e.g. covariates and features, and definition of the landscape extent), the mechanics of model fitting (e.g. feature selection, constraints and regularization) and outputs. Using case studies for a Banksia species native to south-west Australia and a riverine fish, we fit models and interpret them, exploring why certain choices affect the result and what this means. The fish example illustrates use of the model with vector data for linear river segments rather than raster (gridded) data. Appropriate treatments for survey bias, unprojected data, locally restricted species, and predicting to environments outside the range of the training data are demonstrated, and new capabilities discussed. Online appendices include additional details of the model and the mathematical links between previous explanations and this one, example code and data, and further information on the case studies.

4,621 citations


Cites background or methods from "Maximum entropy modeling of species..."

  • ...MaxEnt (Phillips et al., 2006; Phillips & Dudı́k, 2008) is one such method and is the focus of this paper....

    [...]

  • ...…or coefficients These are the parameters of the model that weight the contribution of each feature. k in previous papers*, b in this paper *Phillips et al. (2006), Phillips & Dudı́k (2008) 48 Diversity and Distributions, 17, 43–57, ª 2010 Blackwell Publishing Ltd tuning parameter k.…...

    [...]

  • ...Note also that the AUC in this case is calculated on presence vs. background data (Phillips et al., 2006)....

    [...]

  • ...This was called the ‘‘raw’’ distribution (Phillips et al., 2006), and gave the probability, given the species is present, that it is found at pixel x. Maximizing the entropy of the raw distribution is equivalent to minimizing the relative entropy of f1(z) relative to f(z), so the two formulations…...

    [...]

  • ...The MaxEnt model – a short overview Previous papers have described MaxEnt as estimating a distribution across geographic space (Phillips et al., 2006; Phillips & Dudı́k, 2008)....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: This final installment of the paper considers the case where the signals or the messages or both are continuously variable, in contrast with the discrete nature assumed until now.
Abstract: In this final installment of the paper we consider the case where the signals or the messages or both are continuously variable, in contrast with the discrete nature assumed until now. To a considerable extent the continuous case can be obtained through a limiting process from the discrete case by dividing the continuum of messages and signals into a large but finite number of small regions and calculating the various parameters involved on a discrete basis. As the size of the regions is decreased these parameters in general approach as limits the proper values for the continuous case. There are, however, a few new effects that appear and also a general change of emphasis in the direction of specialization of the general results to particular cases.

65,425 citations

Journal ArticleDOI
TL;DR: A representation and interpretation of the area under a receiver operating characteristic (ROC) curve obtained by the "rating" method, or by mathematical predictions based on patient characteristics, is presented and it is shown that in such a setting the area represents the probability that a randomly chosen diseased subject is (correctly) rated or ranked with greater suspicion than a random chosen non-diseased subject.
Abstract: A representation and interpretation of the area under a receiver operating characteristic (ROC) curve obtained by the "rating" method, or by mathematical predictions based on patient characteristics, is presented. It is shown that in such a setting the area represents the probability that a randomly chosen diseased subject is (correctly) rated or ranked with greater suspicion than a randomly chosen non-diseased subject. Moreover, this probability of a correct ranking is the same quantity that is estimated by the already well-studied nonparametric Wilcoxon statistic. These two relationships are exploited to (a) provide rapid closed-form expressions for the approximate magnitude of the sampling variability, i.e., standard error that one uses to accompany the area under a smoothed ROC curve, (b) guide in determining the size of the sample required to provide a sufficiently reliable estimate of this area, and (c) determine how large sample sizes should be to ensure that one can statistically detect difference...

19,398 citations


"Maximum entropy modeling of species..." refers methods in this paper

  • ...ROC analysis was developed in signal processing and is widely used in clinical medicine(Hanley and McNeil, 1982, 1983; Zweig and Campbell, 1993)....

    [...]

  • ...Each partition was created by randomly selecting 70% of the occurrence localities as training data, with the remaining 30% reserved for testing the resulting models....

    [...]

Journal ArticleDOI
TL;DR: A nonparametric approach to the analysis of areas under correlated ROC curves is presented, by using the theory on generalized U-statistics to generate an estimated covariance matrix.
Abstract: Methods of evaluating and comparing the performance of diagnostic tests are of increasing importance as new tests are developed and marketed. When a test is based on an observed variable that lies on a continuous or graded scale, an assessment of the overall value of the test can be made through the use of a receiver operating characteristic (ROC) curve. The curve is constructed by varying the cutpoint used to determine which values of the observed variable will be considered abnormal and then plotting the resulting sensitivities against the corresponding false positive rates. When two or more empirical curves are constructed based on tests performed on the same individuals, statistical analysis on differences between curves must take into account the correlated nature of the data. This paper presents a nonparametric approach to the analysis of areas under correlated ROC curves, by using the theory on generalized U-statistics to generate an estimated covariance matrix.

16,496 citations


"Maximum entropy modeling of species..." refers methods in this paper

  • ...It uses a non-parametric test(DeLong et al., 1988)to determine whether one prediction is significantly better than another when using correlated samples (i.e., with both predictions evaluated on the same test instances), and reports the result as aχ2 statistic and correspondingp value....

    [...]

Journal ArticleDOI
E. T. Jaynes1
TL;DR: In this article, the authors consider statistical mechanics as a form of statistical inference rather than as a physical theory, and show that the usual computational rules, starting with the determination of the partition function, are an immediate consequence of the maximum-entropy principle.
Abstract: Information theory provides a constructive criterion for setting up probability distributions on the basis of partial knowledge, and leads to a type of statistical inference which is called the maximum-entropy estimate. It is the least biased estimate possible on the given information; i.e., it is maximally noncommittal with regard to missing information. If one considers statistical mechanics as a form of statistical inference rather than as a physical theory, it is found that the usual computational rules, starting with the determination of the partition function, are an immediate consequence of the maximum-entropy principle. In the resulting "subjective statistical mechanics," the usual rules are thus justified independently of any physical argument, and in particular independently of experimental verification; whether or not the results agree with experiment, they still represent the best estimates that could have been made on the basis of the information available.It is concluded that statistical mechanics need not be regarded as a physical theory dependent for its validity on the truth of additional assumptions not contained in the laws of mechanics (such as ergodicity, metric transitivity, equal a priori probabilities, etc.). Furthermore, it is possible to maintain a sharp distinction between its physical and statistical aspects. The former consists only of the correct enumeration of the states of a system and their properties; the latter is a straightforward example of statistical inference.

12,099 citations


"Maximum entropy modeling of species..." refers background in this paper

  • ...Its origins lie in statistical mechanics (Jaynes, 1957) , and it remains an active area of research with an Annual Conference, Maximum Entropy and Bayesian Methods, that explores applications in diverse areas such as astronomy, portfolio optimization, image reconstruction, statistical physics and signal processing....

    [...]

  • ...Jaynes gave a general answer to this question: the best approach is to ensure that the approximation satisfies any constraints on the unknown distribution that we are aware of, and that subject to those constraints, the distribution should have maximum entropy(Jaynes, 1957) ....

    [...]

  • ...E.T. Jaynes gave a general answer to this question: the best approach is to ensure that the approximation satisfies any constraints on the unknown distribution that we are aware of, and that subject to those constraints, the distribution should have maximum entropy(Jaynes, 1957)....

    [...]

  • ...Its origins lie in statistical mechanics(Jaynes, 1957), and it remains an active area of research with an Annual Conference, Maximum Entropy and Bayesian Methods, that explores applications in diverse areas such as astronomy, portfolio optimization, image reconstruction, statistical physics and…...

    [...]

Journal ArticleDOI
08 Jan 2004-Nature
TL;DR: Estimates of extinction risks for sample regions that cover some 20% of the Earth's terrestrial surface show the importance of rapid implementation of technologies to decrease greenhouse gas emissions and strategies for carbon sequestration.
Abstract: Climate change over the past approximately 30 years has produced numerous shifts in the distributions and abundances of species and has been implicated in one species-level extinction. Using projections of species' distributions for future climate scenarios, we assess extinction risks for sample regions that cover some 20% of the Earth's terrestrial surface. Exploring three approaches in which the estimated probability of extinction shows a power-law relationship with geographical range size, we predict, on the basis of mid-range climate-warming scenarios for 2050, that 15-37% of species in our sample of regions and taxa will be 'committed to extinction'. When the average of the three methods and two dispersal scenarios is taken, minimal climate-warming scenarios produce lower projections of species committed to extinction ( approximately 18%) than mid-range ( approximately 24%) and maximum-change ( approximately 35%) scenarios. These estimates show the importance of rapid implementation of technologies to decrease greenhouse gas emissions and strategies for carbon sequestration.

7,089 citations


"Maximum entropy modeling of species..." refers background in this paper

  • ...This is important for applications such as invasive-species management (e.g.,Peterson and Robins, 2003) and predicting the impact of climate change (e.g.,Thomas et al., 2004)....

    [...]