Showing papers in "The Journal of Open Source Software in 2021"

PDF

Open Access

Journal Article•DOI•

seaborn: statistical data visualization

[...]

Michael Waskom

29 Mar 2021-The Journal of Open Source Software

1,604 citations

Journal Article•DOI•

performance: An R Package for Assessment, Comparison and Testing of Statistical Models

[...]

Daniel Lüdecke, Mattan S. Ben-Shachar, Indrajeet Patil¹, Philip D. Waggoner, Dominique Makowski - Show less +1 more•Institutions (1)

Max Planck Society¹

21 Apr 2021-The Journal of Open Source Software

TL;DR: A crucial part of statistical analysis is evaluating a model’s quality and fit, or performance, and investigating the fit of models to data also often involves selecting the best fitting model amongst many competing models.

...read moreread less

Abstract: A crucial part of statistical analysis is evaluating a model’s quality and fit, or performance. During analysis, especially with regression models, investigating the fit of models to data also often involves selecting the best fitting model amongst many competing models. Upon investigation, fit indices should also be reported both visually and numerically to bring readers in on the investigative effort.

...read moreread less

973 citations

Journal Article•DOI•

Visualizations with statistical details: The 'ggstatsplot' approach

[...]

Indrajeet Patil¹•Institutions (1)

Max Planck Society¹

25 May 2021-The Journal of Open Source Software

314 citations

Journal Article•DOI•

The Pencil Code, a modular MPI code for partial differential equations and particles: multipurpose and multiuser-maintained

[...]

21 Feb 2021-The Journal of Open Source Software

TL;DR: The Pencil Code is a highly modular physics-oriented simulation code that can be adapted to a wide range of applications, primarily designed to solve partial differential equations of compressible hydrodynamics but can also evolve Lagrangian particles, their coagulation and condensation, as well as their interaction with the fluid.

...read moreread less

Abstract: The Pencil Code is a highly modular physics-oriented simulation code that can be adapted to a wide range of applications. It is primarily designed to solve partial differential equations (PDEs) of compressible hydrodynamics and has lots of add-ons ranging from astrophysical magnetohydrodynamics (MHD) to meteorological cloud microphysics and engineering applications in combustion. Nevertheless, the framework is general and can also be applied to situations not related to hydrodynamics or even PDEs, for example when just the message passing interface or input/output strategies of the code are to be used. The code can also evolve Lagrangian (inertial and noninertial) particles, their coagulation and condensation, as well as their interaction with the fluid.

...read moreread less

90 citations

Journal Article•DOI•

Augur: a bioinformatics toolkit for phylogenetic analyses of human pathogens.

[...]

John Huddleston¹, John Huddleston², James Hadfield², Thomas R. Sibley², Jover Lee², Kairsten Fay², Misja Ilcisin², Elias Harkins², Trevor Bedford², Trevor Bedford¹, Richard A. Neher³, Richard A. Neher⁴, Emma B. Hodcroft⁴, Emma B. Hodcroft⁵, Emma B. Hodcroft³ - Show less +11 more•Institutions (5)

University of Washington¹, Fred Hutchinson Cancer Research Center², Swiss Institute of Bioinformatics³, University of Basel⁴, University of Bern⁵

07 Jan 2021-The Journal of Open Source Software

TL;DR: Augur as mentioned in this paper is a bioinformatics toolkit designed for phylogenetic analyses of human pathogens, which can be used for real-time analyses of pathogen evolution and can adapt to a variety of questions and organisms.

...read moreread less

Abstract: The analysis of human pathogens requires a diverse collection of bioinformatics tools. These tools include standard genomic and phylogenetic software and custom software developed to handle the relatively numerous and short genomes of viruses and bacteria. Researchers increasingly depend on the outputs of these tools to infer transmission dynamics of human diseases and make actionable recommendations to public health officials (Black et al., 2020; Gardy et al., 2015). In order to enable real-time analyses of pathogen evolution, bioinformatics tools must scale rapidly with the number of samples and be flexible enough to adapt to a variety of questions and organisms. To meet these needs, we developed Augur, a bioinformatics toolkit designed for phylogenetic analyses of human pathogens.

...read moreread less

83 citations

Journal Article•DOI•

`exoplanet`: Gradient-based probabilistic inference for exoplanet data & other astronomical time series

[...]

Daniel Foreman-Mackey, Rodrigo Luger, Eric Agol, Thomas Barclay, Luke G. Bouma, Timothy D. Brandt, Ian Czekala, Trevor J. David, Jiayin Dong, Emily A. Gilbert, Tyler A. Gordon, Christina Hedges, Daniel R. Hey, Brett M. Morris, Adrian M. Price-Whelan, Arjun B. Savel - Show less +12 more

22 Jun 2021-The Journal of Open Source Software

TL;DR: Exoplanet as discussed by the authors is a toolkit for probabilistic modeling of astronomical time series data, with a focus on observations of exoplanets, using PyMC3 (Salvatier et al., 2016).

...read moreread less

Abstract: "exoplanet" is a toolkit for probabilistic modeling of astronomical time series data, with a focus on observations of exoplanets, using PyMC3 (Salvatier et al., 2016). PyMC3 is a flexible and high-performance model-building language and inference engine that scales well to problems with a large number of parameters. "exoplanet" extends PyMC3's modeling language to support many of the custom functions and probability distributions required when fitting exoplanet datasets or other astronomical time series. While it has been used for other applications, such as the study of stellar variability, the primary purpose of "exoplanet" is the characterization of exoplanets or multiple star systems using time-series photometry, astrometry, and/or radial velocity. In particular, the typical use case would be to use one or more of these datasets to place constraints on the physical and orbital parameters of the system, such as planet mass or orbital period, while simultaneously taking into account the effects of stellar variability.

...read moreread less

76 citations

Journal Article•DOI•

UltraNest - a robust, general purpose Bayesian inference engine

[...]

Johannes Buchner

02 Apr 2021-The Journal of Open Source Software

TL;DR: UltraNest as discussed by the authors is a general-purpose Bayesian inference package for parameter estimation and model comparison that allows fitting arbitrary models specified as likelihood functions written in Python, C, C++, Fortran, Julia or R with a focus on correctness and speed.

...read moreread less

Abstract: UltraNest is a general-purpose Bayesian inference package for parameter estimation and model comparison It allows fitting arbitrary models specified as likelihood functions written in Python, C, C++, Fortran, Julia or R With a focus on correctness and speed (in that order), UltraNest is especially useful for multi-modal or non-Gaussian parameter spaces, computational expensive models, in robust pipelines Parallelisation to computing clusters and resuming incomplete runs is available

...read moreread less

76 citations

Journal Article•DOI•

academictwitteR: an R package to access the Twitter Academic Research Product Track v2 API endpoint

[...]

Christopher Barrie, Justin Chun-ting Ho¹•Institutions (1)

Sciences Po¹

07 Jun 2021-The Journal of Open Source Software

TL;DR: The academictwitteR package is built with academic research in mind and encourages efficient and responsible storage of data, given the likely large amounts of data being collected, as well as a number of shortcut and query building functions to access new v2 API endpoints.

...read moreread less

Abstract: In January, 2021, Twitter announced the “Academic Research Product Track.” This provides academic researchers with greatly expanded access to Twitter data. Existing R packages for querying the Twitter API, such as the popular rtweet package (Kearney, 2019), are yet to introduce functionality to allow users to connect to the new v2 API endpoints with Academic Research Product Track credentials. The academictwitteR package (Barrie & Ho, 2021) is built with academic research in mind. It encourages efficient and responsible storage of data, given the likely large amounts of data being collected, as well as a number of shortcut and query building functions to access new v2 API endpoints.

...read moreread less

50 citations

Journal Article•DOI•

The targets R package: a dynamic Make-like function-oriented pipeline toolkit for reproducibility and high-performance computing

[...]

William M. Landau

15 Jan 2021-The Journal of Open Source Software

40 citations

Journal Article•DOI•

splithalf: robust estimates of split half reliability

[...]

Sam Parsons

23 Apr 2021-The Journal of Open Source Software

34 citations

Journal Article•DOI•

pyhf: pure-Python implementation of HistFactory statistical models

[...]

Lukas Heinrich, Matthew Feickert, Giordon Stark, Kyle Cranmer

04 Feb 2021-The Journal of Open Source Software

Journal Article•DOI•

imodels: a python package for fitting interpretable models

[...]

Chandan Singh, Keyan Nasseri, Yan Shuo Tan, Tiffany Tang, Bin Yu - Show less +1 more

04 May 2021-The Journal of Open Source Software

TL;DR: This work provides users a simple interface for fitting and using state-of-the-art interpretable models, all compatible with scikit-learn, and provides a framework for developing custom tools and rule-based models for interpretability.

...read moreread less

Abstract: imodels is a Python package for concise, transparent, and accurate predictive modeling. It provides users a simple interface for fitting and using state-of-the-art interpretable models, all compatible with scikit-learn (Pedregosa et al., 2011). These models can often replace black-box models while improving interpretability and computational efficiency, all without sacrificing predictive accuracy. In addition, the package provides a framework for developing custom tools and rule-based models for interpretability.

...read moreread less

Journal Article•DOI•

`ngsxfem`: Add-on to NGSolve for geometrically unfitted finite element discretizations

[...]

Christoph Lehrenfeld, Fabian Heimann, Janosch Preuß, Henry von Wahl

10 Aug 2021-The Journal of Open Source Software

TL;DR: ngsxfem as discussed by the authors is an add-on library to Netgen/NGSolve, a general purpose, high performance finite element library for the numerical solution of partial differential equations.

...read moreread less

Abstract: ngsxfem is an add-on library to Netgen/NGSolve, a general purpose, high performance finite element library for the numerical solution of partial differential equations. The add-on enables the use of geometrically unfitted finite element technologies known under different labels, e.g. XFEM, CutFEM, TraceFEM, Finite Cell, fictitious domain method or Cut-Cell methods, etc.. Both, Netgen/NGSolve and ngsxfem are written in C++ with a rich Python interface through which it is typically used. ngsxfem is an academic software. Its primary intention is to facilitate the development and validation of new numerical methods.

...read moreread less

Journal Article•DOI•

mikropml: User-Friendly R Package for Supervised Machine Learning Pipelines.

[...]

Begüm D. Topçuoğlu¹, Begüm D. Topçuoğlu², Zena Lapp², Kelly Sovacool², Evan S. Snitkin², Jenna Wiens², Patrick D. Schloss² - Show less +3 more•Institutions (2)

Merck & Co.¹, University of Michigan²

14 May 2021-The Journal of Open Source Software

TL;DR: Mikropml as discussed by the authors is an easy-to-use R package that implements ML pipelines using regression, support vector machines, decision trees, random forest, or gradient-boosted trees.

...read moreread less

Abstract: Machine learning (ML) for classification and prediction based on a set of features is used to make decisions in healthcare, economics, criminal justice and more. However, implementing an ML pipeline including preprocessing, model selection, and evaluation can be time-consuming, confusing, and difficult. Here, we present mikropml (prononced "meek-ROPE em el"), an easy-to-use R package that implements ML pipelines using regression, support vector machines, decision trees, random forest, or gradient-boosted trees. The package is available on GitHub, CRAN, and conda.

...read moreread less

Journal Article•DOI•

see: An R package for visualizing statistical models

[...]

Daniel Lüdecke, Indrajeet Patil¹, Mattan S. Ben-Shachar, Brenton M. Wiernik, Philip D. Waggoner, Dominique Makowski - Show less +2 more•Institutions (1)

Max Planck Society¹

29 Jul 2021-The Journal of Open Source Software

TL;DR: The see package, a core pillar of easystats, helps users to utilize visualization for more informative, communicable, and well-rounded scientific reporting.

...read moreread less

Abstract: The see package is embedded in the easystats ecosystem, a collection of R packages that operate in synergy to provide a consistent and intuitive syntax when working with statistical models in the R programming language (R Core Team, 2021). Most easystats packages return comprehensive numeric summaries of model parameters and performance. The see package complements these numeric summaries with a host of functions and tools to produce a range of publication-ready visualizations for model parameters, predictions, and performance diagnostics. As a core pillar of easystats, the see package helps users to utilize visualization for more informative, communicable, and well-rounded scientific reporting.

...read moreread less

Journal Article•DOI•

atlite: A Lightweight Python Package for Calculating Renewable Power Potentials and Time Series

[...]

Fabian Hofmann, Johannes Hampp, Fabian Neumann, Tom Brown, Jonas Hörsch - Show less +1 more

24 Jun 2021-The Journal of Open Source Software

Journal Article•DOI•

grapesAgri1: Collection of Shiny Apps for Data Analysis in Agriculture

[...]

Pratheesh P. Gopinath, Rajender Parsad, Brigit Joseph, S. Adarsh

18 Jul 2021-The Journal of Open Source Software

TL;DR: This paper discusses the challenges faced by scientists and students to find a suitable platform for data analysis and publish the research outputs in quality journals and some web applications used by the agricultural research community don’t provide options to generate plots and graphs.

...read moreread less

Abstract: Agricultural experiments demand a wide range of statistical tools for analysis, which includes exploratory analysis, design of experiments, and statistical genetics. It is a challenge for scientists and students to find a suitable platform for data analysis and publish the research outputs in quality journals. Most of the software available for data analysis are proprietary or lack a simple user interface, for example SAS® is available in ICAR (Indian Council of Agricultural Research) for data analysis, though it is a highly advanced statistical analysis platform, and its complexity holds back students and researchers from using it. Some web applications like WASP (https://ccari.res.in/waspnew.html) and OPSTAT (http://14.139. 232.166/opstat/) used by the agricultural research community are user friendly but these applications don’t provide options to generate plots and graphs.

...read moreread less

Journal Article•DOI•

SLAM Toolbox: SLAM for the dynamic world

[...]

Steve Macenski, Ivona Jambrecic

13 May 2021-The Journal of Open Source Software

Journal Article•DOI•

kalepy: a Python package for kernel density estimation, sampling and plotting

[...]

Luke Zoltan Kelley

22 Jan 2021-The Journal of Open Source Software

Journal Article•DOI•

IFermi: A python library for Fermi surface generation and analysis

[...]

Alex M. Ganose, Amy Searle, Anubhav Jain, Sinéad M. Griffin

17 Mar 2021-The Journal of Open Source Software

Journal Article•DOI•

ImSwitch: Generalizing microscope control in Python

[...]

Xavier Casas Moreno, Staffan Al-Kadhimi, Jonatan Alvelid, Andreas Bodén, Ilaria Testa - Show less +1 more

14 Aug 2021-The Journal of Open Source Software

Journal Article•DOI•

iharm3D: Vectorized General Relativistic Magnetohydrodynamics

[...]

Ben Prather, George N. Wong, Vedant Dhruv, Benjamin R. Ryan, Joshua C. Dolence, Sean M. Ressler, Charles F. Gammie - Show less +3 more

14 Oct 2021-The Journal of Open Source Software

TL;DR: Iharm3D as discussed by the authors is an open-source C code for simulating black hole accretion systems in arbitrary stationary spacetimes using ideal general-relativistic magnetohydrodynamics (GRMHD).

...read moreread less

Abstract: Iharm3D is an open-source C code for simulating black hole accretion systems in arbitrary stationary spacetimes using ideal general-relativistic magnetohydrodynamics (GRMHD). It is an implementation of the HARM ("High Accuracy Relativistic Magnetohydrodynamics") algorithm outlined in Gammie et al. (2003) with updates as outlined in McKinney & Gammie (2004) and Noble et al. (2006). The code is most directly derived from Ryan et al. (2015) but with radiative transfer portions removed. HARM is a conservative finite-volume scheme for solving the equations of ideal GRMHD, a hyperbolic system of partial differential equations, on a logically Cartesian mesh in arbitrary coordinates.

...read moreread less

Journal Article•DOI•

starfish: scalable pipelines for image-based transcriptomics

[...]

Shannon Axelrod, Matthew Cai, Ambrose J. Carr, Jeremy Freeman, Deep Ganguli, Justin T. Kiggins, Brian Long, Tony Tung, Kevin A. Yamauchi - Show less +5 more

04 May 2021-The Journal of Open Source Software

TL;DR: The exploding field of single cell transcriptomics has begun to enable deep analysis of gene expression and cell types, but spatial context is lost in the preparation of tissue for these assays.

...read moreread less

Abstract: The exploding field of single cell transcriptomics has begun to enable deep analysis of gene expression and cell types, but spatial context is lost in the preparation of tissue for these assays. Recent developments in biochemistry, microfluidics, and microscopy have come together to bring about an “alphabet soup” of technologies that enable sampling gene expression in situ, with varying levels of spatial resolution, sensitivity, and genetic depth. These technologies promise to permit biologists to ask new questions about the spatial relationships between cell type and interactions between gene expression and cell morphology. However, these assays generate very large microscopy datasets which are challenging to process using general microscopy analysis tools. Furthermore, many of these assays require specialized analysis to decode gene expression from multiplexed experimental designs.

...read moreread less

Journal Article•DOI•

EUKulele: Taxonomic annotation of the unsung eukaryotic microbes

[...]

Arianna I. Krinos, Sarah K. Hu, Natalie R. Cohen, Harriet Alexander

08 Jan 2021-The Journal of Open Source Software

TL;DR: For example, EUKulele as mentioned in this paper is an open-source software tool designed to assign taxonomy to microeukaryotes detected in meta-omic samples, and complement analysis approaches in other domains by accommodating assembly output and providing concrete metrics reporting the taxonomic completeness of each sample.

...read moreread less

Abstract: The assessment of microbial species biodiversity is essential in ecology and evolutionary biology (Reaka-Kudla et al. 1996), but especially challenging for communities of microorganisms found in the environment (Das et al. 2006, Hillebrand et al. 2018). Beyond providing a census of organisms in the ocean, assessing marine microbial biodiversity can reveal how microbes respond to environmental change (Salazar et al. 2017), clarify the ecological roles of community members (Hehemann et al. 2016), and lead to biotechnology discoveries (Das et al. 2006). Computational approaches to characterize taxonomic diversity and phylogeny based on the quality of available data for environmental sequence datasets is fundamental for advancing our understanding of the role of these organisms in the environment. Even more pressing is the need for comprehensive and consistent methods to assign taxonomy to environmentally-relevant microbial eukaryotes. Here, we present EUKulele, an open-source software tool designed to assign taxonomy to microeukaryotes detected in meta-omic samples, and complement analysis approaches in other domains by accommodating assembly output and providing concrete metrics reporting the taxonomic completeness of each sample.

...read moreread less

Journal Article•DOI•

Isoreader: An R package to read stable isotope data files for reproducible research

[...]

Sebastian Kopf, Brett Davidheiser-Kroll, Ilja Kocken

27 Apr 2021-The Journal of Open Source Software

TL;DR: The isoreader package implements an easily extendable interface for IRMS data from common instrument vendor file formats and thus enables the reading and processing of stable isotope data directly from the source, providing a foundational tool for platform-independent, efficient and reproducible data reduction.

...read moreread less

Abstract: The measurement and interpretation of the stable isotope composition of any material or molecule has widespread application in disciplines ranging from the earth sciences to ecology, anthropology, and forensics. The naturally occurring differences in the abundance of the stable isotopes of carbon, nitrogen, oxygen, and many other elements provide valuable insight into environmental conditions and sources, fluxes, and mechanisms of material transfer. Because isotopic variations in nature are very small, the measurement itself requires cutting edge analytical instrumentation using isotope ratio mass spectrometry (IRMS) as well as rigorous data reduction procedures for calibration and quality control. The isoreader package implements an easily extendable interface for IRMS data from common instrument vendor file formats and thus enables the reading and processing of stable isotope data directly from the source. This provides a foundational tool for platform-independent, efficient and reproducible data reduction.

...read moreread less

Journal Article•DOI•

Kinetic.jl: A portable finite volume toolbox for scientific and neural computing

[...]

Tianbai Xiao

15 Jun 2021-The Journal of Open Source Software

Journal Article•DOI•

libfmp: A Python Package for Fundamentals of Music Processing

[...]

Meinard Müller, Frank Zalkow

20 Jul 2021-The Journal of Open Source Software

TL;DR: The Python package libfmp is introduced, which provides implementations of well-established model-based algorithms for various MIR tasks (with a focus on the audio domain), including beat tracking, onset detection, chord recognition, music synchronization, version identification, music segmentation, novelty detection, and audio decomposition.

...read moreread less

Abstract: The revolution in music distribution, storage, and consumption has fueled tremendous interest in developing techniques and tools for organizing, structuring, retrieving, navigating, and presenting music-related data. As a result, the academic field of music information retrieval (MIR) has matured over the last 20 years into an independent research area related to many different disciplines, including engineering, computer science, mathematics, and musicology. In this contribution, we introduce the Python package libfmp, which provides implementations of well-established model-based algorithms for various MIR tasks (with a focus on the audio domain), including beat tracking, onset detection, chord recognition, music synchronization, version identification, music segmentation, novelty detection, and audio decomposition. Such traditional approaches not only yield valuable baselines for modern data-driven strategies (e.g., using deep learning) but are also instructive from an educational viewpoint deepening the understanding of the MIR task and music data at hand. Our libfmp package is inspired and closely follows conventions as introduced by librosa, which is a widely used Python library containing standardized and flexible reference implementations of many common methods in audio and music processing (McFee et al., 2015). While the two packages overlap concerning basic feature extraction and MIR algorithms, libfmp contains several reference implementations of advanced music processing pipelines not yet covered by librosa (or other open-source software). Whereas the librosa package is intended to facilitate the high-level composition of basic methods into complex pipelines, a major emphasis of libfmp is on the educational side, promoting the understanding of MIR concepts by closely following the textbook on Fundamentals of Music Processing (FMP) (Müller, 2015). In this way, we hope that libfmp constitutes a valuable complement to existing open-source toolboxes such as librosa while fostering education and research in MIR.

...read moreread less

Journal Article•DOI•

statsExpressions: R Package for Tidy Dataframes and Expressions with Statistical Details

[...]

Indrajeet Patil¹•Institutions (1)

Max Planck Society¹

20 May 2021-The Journal of Open Source Software

Journal Article•DOI•

CliquePercolation: An R Package for conducting and visualizing results of the clique percolation network community detection algorithm

[...]

Jens Lange

07 Jun 2021-The Journal of Open Source Software

TL;DR: One structural characteristic of networks that is investigated frequently across various sciences is the detection of communities, which are strongly connected subgraphs in the network such as groups of friends, thematic fields, or latent factors.

...read moreread less

Abstract: Modeling complex phenomena as networks constitutes one of the – if not the most – versatile field of research (Barabási, 2011). Indeed, many interconnected entities can be represented as networks, in which entities are called nodes and their connections are called edges. For instance, networks can represent friendships between people, hyperlinks between web pages, or correlations between questionnaire items. One structural characteristic of networks that is investigated frequently across various sciences is the detection of communities (Fortunato, 2010). Communities are strongly connected subgraphs in the network such as groups of friends, thematic fields, or latent factors. Most community detection algorithms thereby put each node in only one community. However, nodes are often shared by multiple communities, e.g., when a person is part of multiple groups of friends, web pages belong to different thematic fields, or items load on multiple factors. The most popular community detection algorithm that is aimed at identifying such overlapping communities is the clique percolation algorithm (Farkas et al., 2007; Palla et al., 2005).

...read moreread less