scispace - formally typeset
Search or ask a question
Author

Marinez Ferreira de Siqueira

Bio: Marinez Ferreira de Siqueira is an academic researcher from University of York. The author has contributed to research in topics: Environmental niche modelling & Biodiversity. The author has an hindex of 17, co-authored 48 publications receiving 8068 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: In this article, the spatial distribution of Atlantic cloud forest and its protection status in the Serra da Mantiqueira, southeastern Brazil, using a combination of predictive distribution modelling and remote sensing techniques were estimated.

12 citations

Journal ArticleDOI
TL;DR: It is demonstrated that data mining, which has been used successfully in applications such as business and consumer profile analysis, can be a useful resource in ecology.
Abstract: We used association rules for extracting patterns of co-occurrences.We obtained pairs and groups of species with positive and negative correlation.One multiscale, spatially explicit and multi-species method was proposed.Data mining can be useful to produce lists of species with co-occurrences. The continuous growth of biodiversity databases has led to a search for techniques that can assist researchers. This paper presents a method for the analysis of occurrences of pairs and groups of species that aims to identify patterns in co-occurrences through the application of association rules of data mining. We propose, implement and evaluate a tool to help ecologists formulate and validate hypotheses regarding co-occurrence between two or more species. To validate our approach, we analyzed the occurrence of species with a dataset from the 50-ha Forest Dynamics Project on Barro Colorado Island (BCI). Three case studies were developed based on this tropical forest to evaluate patterns of positive and negative correlation. Our tool can be used to point co-occurrence in a multi-scale form and for multi-species, simultaneously, accelerating the identification process for the Spatial Point Pattern Analysis. This paper demonstrates that data mining, which has been used successfully in applications such as business and consumer profile analysis, can be a useful resource in ecology.

12 citations

Posted ContentDOI
03 Apr 2020-bioRxiv
TL;DR: ModleR as mentioned in this paper is a four-step workflow that wraps some of the common phases executed during an ecological niche model procedure and can be run in interactive local sessions and in high-performance or high-throughput computational (HPC/HTC) platforms.
Abstract: Ecological niche models (ENM) use the environmental variables associated with the currently known distribution of a species to model its ecological niche and project it into the geographic space. Widely used and misused, ENM has become a common tool for ecologists and decision-makers. Many ENM platforms have been developed over the years, first as standalone programs, later as packages within script-based programming languages and environments. The democratization of these programming tools and the advent of Open Science brought a growing concern regarding the reproducibility, transparency, robustness, portability, and interoperability in ENM workflows.ENM workflows have some core components that are replicated between projects. However, they have a large internal variation due to the variety of research questions and applications. Any ecological niche modeling platform should take into account this trade-off between stability and reproducibility on one hand, and flexibility and decision-making on the other. Here, we present modleR, a four-step workflow that wraps some of the common phases executed during an ecological niche model procedure. We have divided the process into (1) data setup, (2) model fitting and projection, (3) partition joining and(4) ensemble modeling (consensus between algorithms). modleR is highly adaptable and replicable depending on the user9s needs and is open to deeper internal parametrization. It can be used as a testing platform due to its consistent folder structure and its capacity to control some sources of variation while changing others. It can be run in interactive local sessions and in high-performance or high-throughput computational (HPC/HTC) platforms and parallelized by species or algorithms. It can also communicate with other tools in the field, allowing the user to enter and exit the workflow at any phase, and execute complementary routines outside the package. Finally, it records metadata and session information at each step, ensuring reproducibility beyond the use of script-based applications.

10 citations

Posted ContentDOI
08 Apr 2021-bioRxiv
TL;DR: PlantR as discussed by the authors is an open-source package that provides a comprehensive tool-box to manage species records from biological collections, which includes tools to download records from different data repositories, standardize typical fields associated with species records, validate the locality, geographical coordinates, taxonomic nomenclature, and species identifications.
Abstract: Species records from biological collections are becoming increasingly available online This unprecedented availability of records has largely supported recent studies in taxonomy, biogeography, macroecology, and biodiversity conservation Biological collections vary in their documentation and notation standards, which have changed through time For different reasons, neither collections nor data repositories perform the editing, formatting, and standardization of the data, leaving these tasks to the final users of the species records (eg taxonomists, ecologists and conservationists) These tasks are challenging, particularly when working with millions of records from hundreds of biological collections To help collection curators and final users perform those tasks, we introduce plantR, an open-source package that provides a comprehensive tool-box to manage species records from biological collections The package is accompanied by the proposal of a reproducible workflow to manage this type of data in taxonomy, ecology, and biodiversity conservation It is implemented in R and designed to handle relatively large data sets as fast as possible Initially designed to handle plant species records, many of the plantR features also apply to other groups of organisms, given that the data structure is similar The plantR workflow includes tools to (1) download records from different data repositories, (2) standardize typical fields associated with species records, (3) validate the locality, geographical coordinates, taxonomic nomenclature, and species identifications, including the retrieval of duplicates across collections, and (4) summarize and export records, including the construction of species checklists with vouchers Other R packages provide tools to tackle some of the workflow steps described above But in addition to the new features and resources related to the data editing and validation, the greatest strength of plantR is to provide a comprehensive and user-friendly workflow in one single environment, performing all tasks from data retrieval to export Thus, plantR can help researchers better assess data quality and avoid data leakage in a wide variety of studies using species records

9 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: In this paper, the use of the maximum entropy method (Maxent) for modeling species geographic distributions with presence-only data was introduced, which is a general-purpose machine learning method with a simple and precise mathematical formulation.

13,120 citations

01 Jun 2012
TL;DR: SPAdes as mentioned in this paper is a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler and on popular assemblers Velvet and SoapDeNovo (for multicell data).
Abstract: The lion's share of bacteria in various environments cannot be cloned in the laboratory and thus cannot be sequenced using existing technologies. A major goal of single-cell genomics is to complement gene-centric metagenomic data with whole-genome assemblies of uncultivated organisms. Assembly of single-cell data is challenging because of highly non-uniform read coverage as well as elevated levels of sequencing errors and chimeric reads. We describe SPAdes, a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler (specialized for single-cell data) and on popular assemblers Velvet and SoapDeNovo (for multicell data). SPAdes generates single-cell assemblies, providing information about genomes of uncultivatable bacteria that vastly exceeds what may be obtained via traditional metagenomics studies. SPAdes is available online ( http://bioinf.spbau.ru/spades ). It is distributed as open source software.

10,124 citations

01 Jan 2002

9,314 citations

Journal ArticleDOI
TL;DR: This work compared 16 modelling methods over 226 species from 6 regions of the world, creating the most comprehensive set of model comparisons to date and found that presence-only data were effective for modelling species' distributions for many species and regions.
Abstract: Prediction of species' distributions is central to diverse applications in ecology, evolution and conservation science. There is increasing electronic access to vast sets of occurrence records in museums and herbaria, yet little effective guidance on how best to use this information in the context of numerous approaches for modelling distributions. To meet this need, we compared 16 modelling methods over 226 species from 6 regions of the world, creating the most comprehensive set of model comparisons to date. We used presence-only data to fit models, and independent presence-absence data to evaluate the predictions. Along with well-established modelling methods such as generalised additive models and GARP and BIOCLIM, we explored methods that either have been developed recently or have rarely been applied to modelling species' distributions. These include machine-learning methods and community models, both of which have features that may make them particularly well suited to noisy or sparse information, as is typical of species' occurrence data. Presence-only data were effective for modelling species' distributions for many species and regions. The novel methods consistently outperformed more established methods. The results of our analysis are promising for the use of data from museums and herbaria, especially as methods suited to the noise inherent in such data improve.

7,589 citations

Journal ArticleDOI
TL;DR: An overview of recent advances in species distribution models, and new avenues for incorporating species migration, population dynamics, biotic interactions and community ecology into SDMs at multiple spatial scales are suggested.
Abstract: In the last two decades, interest in species distribution models (SDMs) of plants and animals has grown dramatically. Recent advances in SDMs allow us to potentially forecast anthropogenic effects on patterns of biodiversity at different spatial scales. However, some limitations still preclude the use of SDMs in many theoretical and practical applications. Here, we provide an overview of recent advances in this field, discuss the ecological principles and assumptions underpinning SDMs, and highlight critical limitations and decisions inherent in the construction and evaluation of SDMs. Particular emphasis is given to the use of SDMs for the assessment of climate change impacts and conservation management issues. We suggest new avenues for incorporating species migration, population dynamics, biotic interactions and community ecology into SDMs at multiple spatial scales. Addressing all these issues requires a better integration of SDMs with ecological theory.

5,620 citations