scispace - formally typeset
Search or ask a question
Author

Lúcia G. Lohmann

Bio: Lúcia G. Lohmann is an academic researcher from University of São Paulo. The author has contributed to research in topics: Anemopaegma & Genus. The author has an hindex of 29, co-authored 131 publications receiving 11766 citations. Previous affiliations of Lúcia G. Lohmann include University of Missouri–St. Louis & Missouri Botanical Garden.
Topics: Anemopaegma, Genus, NdhF, Bignonia, Biodiversity


Papers
More filters
Journal ArticleDOI
TL;DR: This work compared 16 modelling methods over 226 species from 6 regions of the world, creating the most comprehensive set of model comparisons to date and found that presence-only data were effective for modelling species' distributions for many species and regions.
Abstract: Prediction of species' distributions is central to diverse applications in ecology, evolution and conservation science. There is increasing electronic access to vast sets of occurrence records in museums and herbaria, yet little effective guidance on how best to use this information in the context of numerous approaches for modelling distributions. To meet this need, we compared 16 modelling methods over 226 species from 6 regions of the world, creating the most comprehensive set of model comparisons to date. We used presence-only data to fit models, and independent presence-absence data to evaluate the predictions. Along with well-established modelling methods such as generalised additive models and GARP and BIOCLIM, we explored methods that either have been developed recently or have rarely been applied to modelling species' distributions. These include machine-learning methods and community models, both of which have features that may make them particularly well suited to noisy or sparse information, as is typical of species' occurrence data. Presence-only data were effective for modelling species' distributions for many species and regions. The novel methods consistently outperformed more established methods. The results of our analysis are promising for the use of data from museums and herbaria, especially as methods suited to the noise inherent in such data improve.

7,589 citations

Journal ArticleDOI
01 Jan 2015
TL;DR: An updated inventory of Brazilian seed plants is presented and offers important insights into the country's biodiversity as mentioned in this paper, with the publication of the Plants and Fungi Catalogue, and has been updated since by more than 430 specialists working online.
Abstract: An updated inventory of Brazilian seed plants is presented and offers important insights into the country's biodiversity. This work started in 2010, with the publication of the Plants and Fungi Catalogue, and has been updated since by more than 430 specialists working online. Brazil is home to 32,086 native Angiosperms and 23 native Gymnosperms, showing an increase of 3% in its species richness in relation to 2010. The Amazon Rainforest is the richest Brazilian biome for Gymnosperms, while the Atlantic Rainforest is the richest one for Angiosperms. There was a considerable increment in the number of species and endemism rates for biomes, except for the Amazon that showed a decrease of 2.5% of recorded endemics. However, well over half of Brazillian seed plant species (57.4%) is endemic to this territory. The proportion of life-forms varies among different biomes: trees are more expressive in the Amazon and Atlantic Rainforest biomes while herbs predominate in the Pampa, and lianas are more expressive in the Amazon, Atlantic Rainforest, and Pantanal. This compilation serves not only to quantify Brazilian biodiversity, but also to highlight areas where there information is lacking and to provide a framework for the challenge faced in conserving Brazil's unique and diverse flora.

1,123 citations

Journal ArticleDOI
TL;DR: In this paper, the authors evaluated the influence of the number of training points and climate bias in training points, elevation, and range size on model performance using analysis of variance models.
Abstract: Aim Species distribution models and geographical information system (GIS) technologies are becoming increasingly important tools in conservation planning and decision-making. Often the rich data bases of museums and herbaria serve as the primary data for predicting species distributions. Yet key assumptions about the primary data often are untested, and violation of such assumptions may have consequences for model predictions. For example, users of primary data assume that sampling has been random with respect to geography and environmental gradients. Here we evaluate the assumption that plant voucher specimens adequately sample the climatic gradient and test whether violation of this assumption influences model predictions. Location Bolivia and Ecuador. Methods Using 323,711 georeferenced herbarium collections and nine climatic variables, we predicted the distribution of 76 plant species using maximum entropy models (MAXENT) with training points that sampled the climate environments randomly and training points that reflected the climate bias in the herbarium collections. To estimate the distribution of species, MAXENT finds the distribution of maximum entropy (i.e. closest to uniform) subject to the constraint that the expected value for each environmental variable under the estimated distribution matches its empirical average. The experimental design included species that differed in geographical range and elevation; all species were modelled with 20 and 100 training points. We examined the influence of the number of training points and climate bias in training points, elevation and range size on model performance using analysis of variance models. Results We found that significant parts of the climatic gradient were poorly represented in herbarium collections for both countries. For the most part, existing climatic bias in collections did not greatly affect distribution predictions when compared with an unbiased data set. Although the effects of climate bias on prediction accuracy were found to be greater where geographical ranges were characterized by high spatial variation in the degree of climate bias (i.e. ranges where the bias of the various climates sampled by collections deviated considerably from the mean bias), the greatest influence on model performance was the number of presence points used to train the model. Main conclusions These results demonstrate that predictions of species distributions can be quite good despite existing climatic biases in primary data found in natural history collections, if a sufficiently large number of training points is available. Because of consistent overprediction of models, these results also confirm the importance of validating models with independent data or expert opinion. Failure to include independent model validation, especially in cases where training points are limited, may potentially lead to grave errors in conservation decision-making and planning.

288 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: Preface to the Princeton Landmarks in Biology Edition vii Preface xi Symbols used xiii 1.
Abstract: Preface to the Princeton Landmarks in Biology Edition vii Preface xi Symbols Used xiii 1. The Importance of Islands 3 2. Area and Number of Speicies 8 3. Further Explanations of the Area-Diversity Pattern 19 4. The Strategy of Colonization 68 5. Invasibility and the Variable Niche 94 6. Stepping Stones and Biotic Exchange 123 7. Evolutionary Changes Following Colonization 145 8. Prospect 181 Glossary 185 References 193 Index 201

14,171 citations

01 Jun 2012
TL;DR: SPAdes as mentioned in this paper is a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler and on popular assemblers Velvet and SoapDeNovo (for multicell data).
Abstract: The lion's share of bacteria in various environments cannot be cloned in the laboratory and thus cannot be sequenced using existing technologies. A major goal of single-cell genomics is to complement gene-centric metagenomic data with whole-genome assemblies of uncultivated organisms. Assembly of single-cell data is challenging because of highly non-uniform read coverage as well as elevated levels of sequencing errors and chimeric reads. We describe SPAdes, a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler (specialized for single-cell data) and on popular assemblers Velvet and SoapDeNovo (for multicell data). SPAdes generates single-cell assemblies, providing information about genomes of uncultivatable bacteria that vastly exceeds what may be obtained via traditional metagenomics studies. SPAdes is available online ( http://bioinf.spbau.ru/spades ). It is distributed as open source software.

10,124 citations

Journal ArticleDOI
25 Apr 2013-Nature
TL;DR: These new risk maps and infection estimates provide novel insights into the global, regional and national public health burden imposed by dengue and will help to guide improvements in disease control strategies using vaccine, drug and vector control methods, and in their economic evaluation.
Abstract: Dengue is a systemic viral infection transmitted between humans by Aedes mosquitoes. For some patients, dengue is a life-threatening illness. There are currently no licensed vaccines or specific therapeutics, and substantial vector control efforts have not stopped its rapid emergence and global spread. The contemporary worldwide distribution of the risk of dengue virus infection and its public health burden are poorly known. Here we undertake an exhaustive assembly of known records of dengue occurrence worldwide, and use a formal modelling framework to map the global distribution of dengue risk. We then pair the resulting risk map with detailed longitudinal information from dengue cohort studies and population surfaces to infer the public health burden of dengue in 2010. We predict dengue to be ubiquitous throughout the tropics, with local spatial variations in risk influenced strongly by rainfall, temperature and the degree of urbanization. Using cartographic approaches, we estimate there to be 390 million (95% credible interval 284-528) dengue infections per year, of which 96 million (67-136) manifest apparently (any level of disease severity). This infection total is more than three times the dengue burden estimate of the World Health Organization. Stratification of our estimates by country allows comparison with national dengue reporting, after taking into account the probability of an apparent infection being formally reported. The most notable differences are discussed. These new risk maps and infection estimates provide novel insights into the global, regional and national public health burden imposed by dengue. We anticipate that they will provide a starting point for a wider discussion about the global impact of this disease and will help to guide improvements in disease control strategies using vaccine, drug and vector control methods, and in their economic evaluation.

7,238 citations

Journal ArticleDOI
TL;DR: It was found that methods specifically designed for collinearity, such as latent variable methods and tree based models, did not outperform the traditional GLM and threshold-based pre-selection and the value of GLM in combination with penalised methods and thresholds when omitted variables are considered in the final interpretation.
Abstract: Collinearity refers to the non independence of predictor variables, usually in a regression-type analysis. It is a common feature of any descriptive ecological data set and can be a problem for parameter estimation because it inflates the variance of regression parameters and hence potentially leads to the wrong identification of relevant predictors in a statistical model. Collinearity is a severe problem when a model is trained on data from one region or time, and predicted to another with a different or unknown structure of collinearity. To demonstrate the reach of the problem of collinearity in ecology, we show how relationships among predictors differ between biomes, change over spatial scales and through time. Across disciplines, different approaches to addressing collinearity problems have been developed, ranging from clustering of predictors, threshold-based pre-selection, through latent variable methods, to shrinkage and regularisation. Using simulated data with five predictor-response relationships of increasing complexity and eight levels of collinearity we compared ways to address collinearity with standard multiple regression and machine-learning approaches. We assessed the performance of each approach by testing its impact on prediction to new data. In the extreme, we tested whether the methods were able to identify the true underlying relationship in a training dataset with strong collinearity by evaluating its performance on a test dataset without any collinearity. We found that methods specifically designed for collinearity, such as latent variable methods and tree based models, did not outperform the traditional GLM and threshold-based pre-selection. Our results highlight the value of GLM in combination with penalised methods (particularly ridge) and threshold-based pre-selection when omitted variables are considered in the final interpretation. However, all approaches tested yielded degraded predictions under change in collinearity structure and the ‘folk lore’-thresholds of correlation coefficients between predictor variables of |r| >0.7 was an appropriate indicator for when collinearity begins to severely distort model estimation and subsequent prediction. The use of ecological understanding of the system in pre-analysis variable selection and the choice of the least sensitive statistical approaches reduce the problems of collinearity, but cannot ultimately solve them.

6,199 citations

Journal ArticleDOI
TL;DR: This paper presents a tuning method that uses presence-only data for parameter tuning, and introduces several concepts that improve the predictive accuracy and running time of Maxent and describes a new logistic output format that gives an estimate of probability of presence.
Abstract: Accurate modeling of geographic distributions of species is crucial to various applications in ecology and conservation. The best performing techniques often require some parameter tuning, which may be prohibitively time-consuming to do separately for each species, or unreliable for small or biased datasets. Additionally, even with the abundance of good quality data, users interested in the application of species models need not have the statistical knowledge required for detailed tuning. In such cases, it is desirable to use "default settings", tuned and validated on diverse datasets. Maxent is a recently introduced modeling technique, achieving high predictive accuracy and enjoying several additional attractive properties. The performance of Maxent is influenced by a moderate number of parameters. The first contribution of this paper is the empirical tuning of these parameters. Since many datasets lack information about species absence, we present a tuning method that uses presence-only data. We evaluate our method on independently collected high-quality presence-absence data. In addition to tuning, we introduce several concepts that improve the predictive accuracy and running time of Maxent. We introduce "hinge features" that model more complex relationships in the training data; we describe a new logistic output format that gives an estimate of probability of presence; finally we explore "background sampling" strategies that cope with sample selection bias and decrease model-building time. Our evaluation, based on a diverse dataset of 226 species from 6 regions, shows: 1) default settings tuned on presence-only data achieve performance which is almost as good as if they had been tuned on the evaluation data itself; 2) hinge features substantially improve model performance; 3) logistic output improves model calibration, so that large differences in output values correspond better to large differences in suitability; 4) "target-group" background sampling can give much better predictive performance than random background sampling; 5) random background sampling results in a dramatic decrease in running time, with no decrease in model performance.

5,314 citations