scispace - formally typeset
Search or ask a question

Showing papers in "The Journal of Open Source Software in 2021"



Journal ArticleDOI
TL;DR: A crucial part of statistical analysis is evaluating a model’s quality and fit, or performance, and investigating the fit of models to data also often involves selecting the best fitting model amongst many competing models.
Abstract: A crucial part of statistical analysis is evaluating a model’s quality and fit, or performance. During analysis, especially with regression models, investigating the fit of models to data also often involves selecting the best fitting model amongst many competing models. Upon investigation, fit indices should also be reported both visually and numerically to bring readers in on the investigative effort.

973 citations



Journal ArticleDOI
TL;DR: The Pencil Code is a highly modular physics-oriented simulation code that can be adapted to a wide range of applications, primarily designed to solve partial differential equations of compressible hydrodynamics but can also evolve Lagrangian particles, their coagulation and condensation, as well as their interaction with the fluid.
Abstract: The Pencil Code is a highly modular physics-oriented simulation code that can be adapted to a wide range of applications. It is primarily designed to solve partial differential equations (PDEs) of compressible hydrodynamics and has lots of add-ons ranging from astrophysical magnetohydrodynamics (MHD) to meteorological cloud microphysics and engineering applications in combustion. Nevertheless, the framework is general and can also be applied to situations not related to hydrodynamics or even PDEs, for example when just the message passing interface or input/output strategies of the code are to be used. The code can also evolve Lagrangian (inertial and noninertial) particles, their coagulation and condensation, as well as their interaction with the fluid.

90 citations


Journal ArticleDOI
TL;DR: Augur as mentioned in this paper is a bioinformatics toolkit designed for phylogenetic analyses of human pathogens, which can be used for real-time analyses of pathogen evolution and can adapt to a variety of questions and organisms.
Abstract: The analysis of human pathogens requires a diverse collection of bioinformatics tools. These tools include standard genomic and phylogenetic software and custom software developed to handle the relatively numerous and short genomes of viruses and bacteria. Researchers increasingly depend on the outputs of these tools to infer transmission dynamics of human diseases and make actionable recommendations to public health officials (Black et al., 2020; Gardy et al., 2015). In order to enable real-time analyses of pathogen evolution, bioinformatics tools must scale rapidly with the number of samples and be flexible enough to adapt to a variety of questions and organisms. To meet these needs, we developed Augur, a bioinformatics toolkit designed for phylogenetic analyses of human pathogens.

83 citations


Journal ArticleDOI
TL;DR: Exoplanet as discussed by the authors is a toolkit for probabilistic modeling of astronomical time series data, with a focus on observations of exoplanets, using PyMC3 (Salvatier et al., 2016).
Abstract: "exoplanet" is a toolkit for probabilistic modeling of astronomical time series data, with a focus on observations of exoplanets, using PyMC3 (Salvatier et al., 2016). PyMC3 is a flexible and high-performance model-building language and inference engine that scales well to problems with a large number of parameters. "exoplanet" extends PyMC3's modeling language to support many of the custom functions and probability distributions required when fitting exoplanet datasets or other astronomical time series. While it has been used for other applications, such as the study of stellar variability, the primary purpose of "exoplanet" is the characterization of exoplanets or multiple star systems using time-series photometry, astrometry, and/or radial velocity. In particular, the typical use case would be to use one or more of these datasets to place constraints on the physical and orbital parameters of the system, such as planet mass or orbital period, while simultaneously taking into account the effects of stellar variability.

76 citations


Journal ArticleDOI
TL;DR: UltraNest as discussed by the authors is a general-purpose Bayesian inference package for parameter estimation and model comparison that allows fitting arbitrary models specified as likelihood functions written in Python, C, C++, Fortran, Julia or R with a focus on correctness and speed.
Abstract: UltraNest is a general-purpose Bayesian inference package for parameter estimation and model comparison It allows fitting arbitrary models specified as likelihood functions written in Python, C, C++, Fortran, Julia or R With a focus on correctness and speed (in that order), UltraNest is especially useful for multi-modal or non-Gaussian parameter spaces, computational expensive models, in robust pipelines Parallelisation to computing clusters and resuming incomplete runs is available

76 citations


Journal ArticleDOI
TL;DR: The academictwitteR package is built with academic research in mind and encourages efficient and responsible storage of data, given the likely large amounts of data being collected, as well as a number of shortcut and query building functions to access new v2 API endpoints.
Abstract: In January, 2021, Twitter announced the “Academic Research Product Track.” This provides academic researchers with greatly expanded access to Twitter data. Existing R packages for querying the Twitter API, such as the popular rtweet package (Kearney, 2019), are yet to introduce functionality to allow users to connect to the new v2 API endpoints with Academic Research Product Track credentials. The academictwitteR package (Barrie & Ho, 2021) is built with academic research in mind. It encourages efficient and responsible storage of data, given the likely large amounts of data being collected, as well as a number of shortcut and query building functions to access new v2 API endpoints.

50 citations





Journal ArticleDOI
TL;DR: This work provides users a simple interface for fitting and using state-of-the-art interpretable models, all compatible with scikit-learn, and provides a framework for developing custom tools and rule-based models for interpretability.
Abstract: imodels is a Python package for concise, transparent, and accurate predictive modeling. It provides users a simple interface for fitting and using state-of-the-art interpretable models, all compatible with scikit-learn (Pedregosa et al., 2011). These models can often replace black-box models while improving interpretability and computational efficiency, all without sacrificing predictive accuracy. In addition, the package provides a framework for developing custom tools and rule-based models for interpretability.

Journal ArticleDOI
TL;DR: ngsxfem as discussed by the authors is an add-on library to Netgen/NGSolve, a general purpose, high performance finite element library for the numerical solution of partial differential equations.
Abstract: ngsxfem is an add-on library to Netgen/NGSolve, a general purpose, high performance finite element library for the numerical solution of partial differential equations. The add-on enables the use of geometrically unfitted finite element technologies known under different labels, e.g. XFEM, CutFEM, TraceFEM, Finite Cell, fictitious domain method or Cut-Cell methods, etc.. Both, Netgen/NGSolve and ngsxfem are written in C++ with a rich Python interface through which it is typically used. ngsxfem is an academic software. Its primary intention is to facilitate the development and validation of new numerical methods.

Journal ArticleDOI
TL;DR: Mikropml as discussed by the authors is an easy-to-use R package that implements ML pipelines using regression, support vector machines, decision trees, random forest, or gradient-boosted trees.
Abstract: Machine learning (ML) for classification and prediction based on a set of features is used to make decisions in healthcare, economics, criminal justice and more. However, implementing an ML pipeline including preprocessing, model selection, and evaluation can be time-consuming, confusing, and difficult. Here, we present mikropml (prononced "meek-ROPE em el"), an easy-to-use R package that implements ML pipelines using regression, support vector machines, decision trees, random forest, or gradient-boosted trees. The package is available on GitHub, CRAN, and conda.

Journal ArticleDOI
TL;DR: The see package, a core pillar of easystats, helps users to utilize visualization for more informative, communicable, and well-rounded scientific reporting.
Abstract: The see package is embedded in the easystats ecosystem, a collection of R packages that operate in synergy to provide a consistent and intuitive syntax when working with statistical models in the R programming language (R Core Team, 2021). Most easystats packages return comprehensive numeric summaries of model parameters and performance. The see package complements these numeric summaries with a host of functions and tools to produce a range of publication-ready visualizations for model parameters, predictions, and performance diagnostics. As a core pillar of easystats, the see package helps users to utilize visualization for more informative, communicable, and well-rounded scientific reporting.


Journal ArticleDOI
TL;DR: This paper discusses the challenges faced by scientists and students to find a suitable platform for data analysis and publish the research outputs in quality journals and some web applications used by the agricultural research community don’t provide options to generate plots and graphs.
Abstract: Agricultural experiments demand a wide range of statistical tools for analysis, which includes exploratory analysis, design of experiments, and statistical genetics. It is a challenge for scientists and students to find a suitable platform for data analysis and publish the research outputs in quality journals. Most of the software available for data analysis are proprietary or lack a simple user interface, for example SAS® is available in ICAR (Indian Council of Agricultural Research) for data analysis, though it is a highly advanced statistical analysis platform, and its complexity holds back students and researchers from using it. Some web applications like WASP (https://ccari.res.in/waspnew.html) and OPSTAT (http://14.139. 232.166/opstat/) used by the agricultural research community are user friendly but these applications don’t provide options to generate plots and graphs.





Journal ArticleDOI
TL;DR: Iharm3D as discussed by the authors is an open-source C code for simulating black hole accretion systems in arbitrary stationary spacetimes using ideal general-relativistic magnetohydrodynamics (GRMHD).
Abstract: Iharm3D is an open-source C code for simulating black hole accretion systems in arbitrary stationary spacetimes using ideal general-relativistic magnetohydrodynamics (GRMHD). It is an implementation of the HARM ("High Accuracy Relativistic Magnetohydrodynamics") algorithm outlined in Gammie et al. (2003) with updates as outlined in McKinney & Gammie (2004) and Noble et al. (2006). The code is most directly derived from Ryan et al. (2015) but with radiative transfer portions removed. HARM is a conservative finite-volume scheme for solving the equations of ideal GRMHD, a hyperbolic system of partial differential equations, on a logically Cartesian mesh in arbitrary coordinates.


Journal ArticleDOI
TL;DR: The exploding field of single cell transcriptomics has begun to enable deep analysis of gene expression and cell types, but spatial context is lost in the preparation of tissue for these assays.
Abstract: The exploding field of single cell transcriptomics has begun to enable deep analysis of gene expression and cell types, but spatial context is lost in the preparation of tissue for these assays. Recent developments in biochemistry, microfluidics, and microscopy have come together to bring about an “alphabet soup” of technologies that enable sampling gene expression in situ, with varying levels of spatial resolution, sensitivity, and genetic depth. These technologies promise to permit biologists to ask new questions about the spatial relationships between cell type and interactions between gene expression and cell morphology. However, these assays generate very large microscopy datasets which are challenging to process using general microscopy analysis tools. Furthermore, many of these assays require specialized analysis to decode gene expression from multiplexed experimental designs.

Journal ArticleDOI
TL;DR: For example, EUKulele as mentioned in this paper is an open-source software tool designed to assign taxonomy to microeukaryotes detected in meta-omic samples, and complement analysis approaches in other domains by accommodating assembly output and providing concrete metrics reporting the taxonomic completeness of each sample.
Abstract: The assessment of microbial species biodiversity is essential in ecology and evolutionary biology (Reaka-Kudla et al. 1996), but especially challenging for communities of microorganisms found in the environment (Das et al. 2006, Hillebrand et al. 2018). Beyond providing a census of organisms in the ocean, assessing marine microbial biodiversity can reveal how microbes respond to environmental change (Salazar et al. 2017), clarify the ecological roles of community members (Hehemann et al. 2016), and lead to biotechnology discoveries (Das et al. 2006). Computational approaches to characterize taxonomic diversity and phylogeny based on the quality of available data for environmental sequence datasets is fundamental for advancing our understanding of the role of these organisms in the environment. Even more pressing is the need for comprehensive and consistent methods to assign taxonomy to environmentally-relevant microbial eukaryotes. Here, we present EUKulele, an open-source software tool designed to assign taxonomy to microeukaryotes detected in meta-omic samples, and complement analysis approaches in other domains by accommodating assembly output and providing concrete metrics reporting the taxonomic completeness of each sample.

Journal ArticleDOI
TL;DR: The isoreader package implements an easily extendable interface for IRMS data from common instrument vendor file formats and thus enables the reading and processing of stable isotope data directly from the source, providing a foundational tool for platform-independent, efficient and reproducible data reduction.
Abstract: The measurement and interpretation of the stable isotope composition of any material or molecule has widespread application in disciplines ranging from the earth sciences to ecology, anthropology, and forensics. The naturally occurring differences in the abundance of the stable isotopes of carbon, nitrogen, oxygen, and many other elements provide valuable insight into environmental conditions and sources, fluxes, and mechanisms of material transfer. Because isotopic variations in nature are very small, the measurement itself requires cutting edge analytical instrumentation using isotope ratio mass spectrometry (IRMS) as well as rigorous data reduction procedures for calibration and quality control. The isoreader package implements an easily extendable interface for IRMS data from common instrument vendor file formats and thus enables the reading and processing of stable isotope data directly from the source. This provides a foundational tool for platform-independent, efficient and reproducible data reduction.


Journal ArticleDOI
TL;DR: The Python package libfmp is introduced, which provides implementations of well-established model-based algorithms for various MIR tasks (with a focus on the audio domain), including beat tracking, onset detection, chord recognition, music synchronization, version identification, music segmentation, novelty detection, and audio decomposition.
Abstract: The revolution in music distribution, storage, and consumption has fueled tremendous interest in developing techniques and tools for organizing, structuring, retrieving, navigating, and presenting music-related data. As a result, the academic field of music information retrieval (MIR) has matured over the last 20 years into an independent research area related to many different disciplines, including engineering, computer science, mathematics, and musicology. In this contribution, we introduce the Python package libfmp, which provides implementations of well-established model-based algorithms for various MIR tasks (with a focus on the audio domain), including beat tracking, onset detection, chord recognition, music synchronization, version identification, music segmentation, novelty detection, and audio decomposition. Such traditional approaches not only yield valuable baselines for modern data-driven strategies (e.g., using deep learning) but are also instructive from an educational viewpoint deepening the understanding of the MIR task and music data at hand. Our libfmp package is inspired and closely follows conventions as introduced by librosa, which is a widely used Python library containing standardized and flexible reference implementations of many common methods in audio and music processing (McFee et al., 2015). While the two packages overlap concerning basic feature extraction and MIR algorithms, libfmp contains several reference implementations of advanced music processing pipelines not yet covered by librosa (or other open-source software). Whereas the librosa package is intended to facilitate the high-level composition of basic methods into complex pipelines, a major emphasis of libfmp is on the educational side, promoting the understanding of MIR concepts by closely following the textbook on Fundamentals of Music Processing (FMP) (Müller, 2015). In this way, we hope that libfmp constitutes a valuable complement to existing open-source toolboxes such as librosa while fostering education and research in MIR.


Journal ArticleDOI
TL;DR: One structural characteristic of networks that is investigated frequently across various sciences is the detection of communities, which are strongly connected subgraphs in the network such as groups of friends, thematic fields, or latent factors.
Abstract: Modeling complex phenomena as networks constitutes one of the – if not the most – versatile field of research (Barabási, 2011). Indeed, many interconnected entities can be represented as networks, in which entities are called nodes and their connections are called edges. For instance, networks can represent friendships between people, hyperlinks between web pages, or correlations between questionnaire items. One structural characteristic of networks that is investigated frequently across various sciences is the detection of communities (Fortunato, 2010). Communities are strongly connected subgraphs in the network such as groups of friends, thematic fields, or latent factors. Most community detection algorithms thereby put each node in only one community. However, nodes are often shared by multiple communities, e.g., when a person is part of multiple groups of friends, web pages belong to different thematic fields, or items load on multiple factors. The most popular community detection algorithm that is aimed at identifying such overlapping communities is the clique percolation algorithm (Farkas et al., 2007; Palla et al., 2005).