scispace - formally typeset
Search or ask a question

Showing papers in "The Journal of Open Source Software in 2020"



Journal ArticleDOI
TL;DR: Recommender systems aim at providing users with a list of recommendations of items that a service offers, for example, a video streaming service will typically rely on a recommender system to propose a personalized list of movies or series to each of its users.
Abstract: Recommender systems aim at providing users with a list of recommendations of items that a service offers. For example, a video streaming service will typically rely on a recommender system to propose a personalized list of movies or series to each of its users. A typical problem in recommendation is that of rating prediction: given an incomplete dataset of useritem interations which take the form of numerical ratings (e.g. on a scale from 1 to 5), the goal is to predict the missing ratings for all remaining user-item pairs.

215 citations


Journal ArticleDOI
TL;DR: The performance of the pre-trained models are very close to the published state-of-the-art and is one of the best performing 4 stems separation model on the common musdb18 benchmark to be publicly released.
Abstract: The performance of the pre-trained models are very close to the published state-of-the-art and is one of the best performing 4 stems separation model on the common musdb18 benchmark (Rafii, Liutkus, Stöter, Mimilakis, & Bittner, 2017) to be publicly released. Spleeter is also very fast as it can separate a mix audio file into 4 stems 100 times faster than real-time (we note, though, that the model cannot be applied in real-time as it needs buffering) on a single Graphics Processing Unit (GPU) using the pre-trained 4-stems model.

175 citations


Journal ArticleDOI
TL;DR: The ggalluvial package adopts a distinctive geological nomenclature to distinguish “alluvial plots” and their graphical elements from Sankey diagrams and parallel sets plots, which I hope prove useful as these visualization tools converge toward common standards.
Abstract: The package makes two key contributions to the R ecosystem. First, ggalluvial anchors the imprecise notion of an alluvial diagram to the rigid grammar of graphics (Wilkinson, 2006), which lends the plots more precise meaning and opens up many combinatorial possibilities. Second, ggalluvial adopts a distinctive geological nomenclature to distinguish “alluvial plots” and their graphical elements from Sankey diagrams and parallel sets plots, which I hope prove useful as these visualization tools converge toward common standards.

148 citations


Journal ArticleDOI
TL;DR: Correlations tests are presented, a toolbox for the R language (R Core Team, 2019) and part of the easystats collection, focused on correlation analysis, which allows for the computation of many different kinds of correlations.
Abstract: Correlations tests are arguably one of the most commonly used statistical procedures, and are used as a basis in many applications such as exploratory data analysis, structural modelling, data engineering etc. In this context, we present correlation, a toolbox for the R language (R Core Team, 2019) and part of the easystats collection, focused on correlation analysis. Its goal is to be lightweight, easy to use, and allows for the computation of many different kinds of correlations, such as:

144 citations


Journal ArticleDOI
TL;DR: The recent growth of data science is partly fueled by the ever-growing amount of data and the joint important developments in statistical modeling, with new and powerful models and frameworks becoming accessible to users.
Abstract: The recent growth of data science is partly fueled by the ever-growing amount of data and the joint important developments in statistical modeling, with new and powerful models and frameworks becoming accessible to users. Although there exist some generic functions to obtain model summaries and parameters, many package-specific modeling functions do not provide such methods to allow users to access such valuable information.

142 citations


Journal ArticleDOI
TL;DR: Efficient implementantions of bio-inspired and evolutionary algorithms are sided to state-ofthe-art optimization algorithms and can be used concurrently to build an optimization pipeline exploiting algorithmic cooperation via the asynchronous, generalized island model.
Abstract: Efficient implementantions of bio-inspired and evolutionary algorithms are sided to state-ofthe-art optimization algorithms (Simplex Methods, SQP methods, interior points methods, etc.) and can be used concurrently (also together with algorithms coded by the user) to build an optimization pipeline exploiting algorithmic cooperation via the asynchronous, generalized island model (Izzo, Ruciński, & Biscani, 2012).

124 citations


Journal ArticleDOI
TL;DR: CMasher as discussed by the authors is a Python package that provides a curated collection of scientific colormaps, showcased in the online documentation (http:// https://www.cmasher.org ).
Abstract: CMasher is a Python package that provides a curated collection of scientific colormaps, showcased in the online documentation (this https URL). The colormaps in CMasher are all designed to be perceptually uniform sequential using the 'viscm' package; most of them are color-vision deficiency friendly; and they cover a wide range of different color combinations to accommodate for most applications. It aims to provide several alternatives to commonly used colormaps, like 'chroma' and 'rainforest' for 'jet'; 'sunburst' for 'hot'; 'neutral' for 'binary'; and 'fusion' and 'redshift' for 'coolwarm'. With CMasher, I hope to help others with picking the correct colormap for the job.

118 citations



Journal ArticleDOI
TL;DR: Authors of papers retaincopyright and release the work under a Creative CommonsAttribution 4.0 InternationalLicense (CC-BY) after it is released to the public.
Abstract: Scientists have long quantified empirical observations by developing mathematical models that characterize the observations, have some measure of interpretability, and are capable of making predictions. Dynamical systems models in particular have been widely used to study, explain, and predict system behavior in a wide range of application areas, with examples ranging from Newton’s laws of classical mechanics to the Michaelis-Menten kinetics for modeling enzyme kinetics. While governing laws and equations were traditionally derived by hand, the current growth of available measurement data and resulting emphasis on data-driven modeling motivates algorithmic approaches for model discovery. A number of such approaches have been developed in recent years and have generated widespread interest, including Eureqa (Schmidt & Lipson, 2009), sure independence screening and sparsifying operator (Ouyang, Curtarolo, Ahmetcik, Scheffler, & Ghiringhelli, 2018), and the sparse identification of nonlinear dynamics (SINDy) (Brunton, Proctor, & Kutz, 2016). Maximizing the impact of these model discovery methods requires tools to make them widely accessible to scientists across domains and at various levels of mathematical expertise.

98 citations


Journal ArticleDOI
TL;DR: Prochaska et al. as mentioned in this paper developed PypeIt, a Python package for semi-automated reduction of astronomical, spectroscopic data, including a complete list of the input parameters and available functionality.
Abstract: Author(s): Prochaska, J; Hennawi, Joseph; Westfall, Kyle; Cooke, Ryan; Wang, Feige; Hsyu, Tiffany; Davies, Frederick; Farina, Emanuele; Pelliccia, Debora | Abstract: PypeIt is a Python package for semi-automated reduction of astronomical, spectroscopic data. Its algorithms build on decades-long development of previous data reduction pipelines by the developers (Bernstein, Burles, a Prochaska, 2015; Bochanski et al., 2009). The reduction procedure -- including a complete list of the input parameters and available functionality -- is provided as online documentation hosted by Read the Docs, which is regularly updated. (https://pypeit.readthedocs.io/en/latest/). Release v1.0.3 serves the following spectrographs: Gemini/GNIRS, Gemini/GMOS, Gemini/FLAMINGOS 2, Lick/Kast, Magellan/MagE, Magellan/Fire, MDM/OSMOS, Keck/DEIMOS (600ZD, 830G, 1200G), Keck/LRIS, Keck/MOSFIRE (J and Y gratings tested), Keck/NIRES, Keck/NIRSPEC (low-dispersion), LBT/Luci-I, Luci-II, LBT/MODS (beta), NOT/ALFOSC (grism4), VLT/X-Shooter (VIS, NIR), VLT/FORS2 (300I, 300V), WHT/ISIS.

Journal ArticleDOI
TL;DR: The winners of the 2016 European Under-23 Championship are: Olga Stamati, Edward Andò, Emmanuel Roubin, Rémi Cailletaud, Max Wiebicke, and Cyrille Couture.
Abstract: Olga Stamati1, Edward Andò1, Emmanuel Roubin1, Rémi Cailletaud1, 2, Max Wiebicke1, 3, Gustavo Pinzon1, Cyrille Couture1, Ryan C. Hurley4, Robert Caulk1, Denis Caillerie1, Takashi Matsushima5, Pierre Bésuelle1, Félix Bertoni6, Tom Arnaud6, Alejandro Ortega Laborin1, Riccardo Rorato7, Yue Sun8, Alessandro Tengattini1, 9, Olumide Okubadejo1, Jean-Baptiste Colliat8, Mohammad Saadatfar10, Fernando E. Garcia11, Christos Papazoglou1, Ilija Vego1, Sébastien Brisard12, Jelke Dijkstra13, and Georgios Birmpilis13

Journal ArticleDOI
TL;DR: Gegeemap is a Python package for interactive mapping with Google Earth Engine (GEE), which is a cloud computing platform with a multi-petabyte catalog of satellite imagery and geospatial datasets (e.g., Landsat, Sentinel, MODIS, NAIP).
Abstract: geemap is a Python package for interactive mapping with Google Earth Engine (GEE), which is a cloud computing platform with a multi-petabyte catalog of satellite imagery and geospatial datasets (e.g., Landsat, Sentinel, MODIS, NAIP) (Gorelick et al., 2017). During the past few years, GEE has become very popular in the geospatial community and it has empowered numerous environmental applications at local, regional, and global scales. Some of the notable environmental applications include mapping global forest change (Hansen et al., 2013), global urban change (Liu et al., 2020), global surface water change (Pekel, Cottam, Gorelick, & Belward, 2016), wetland inundation dynamics (Wu et al., 2019), vegetation phenology (Li et al., 2019), and time series analysis (Kennedy et al., 2018).

Journal ArticleDOI
TL;DR: Larq is an ecosystem of Python packages for BNNs and other Quantized Neural Networks (QNNs), intended to facilitate researchers to resolve outstanding questions about how to improve computational efficiency in neural networks.
Abstract: Modern deep learning methods have been successfully applied to many different tasks and have the potential to revolutionize everyday lives. However, existing neural networks that use 32 bits to encode each weight and activation often have an energy budget that far exceeds the capabilities of mobile or embedded devices. One common way to improve computational efficiency is to reduce the precision of the network to 16-bit or 8-bit, also known as quantization. Binarized Neural Networks (BNNs) represent an extreme case of quantized networks, that cannot be viewed as approximations to real-valued networks and therefore requires special tools and optimization strategies (Helwegen et al., 2019). In these networks both weights and activations are restricted to {−1,+1} (Hubara, Courbariaux, Soudry, El-Yaniv, & Bengio, 2016). Compared to an equivalent 8-bit quantized network BNNs require 8 times smaller memory size and 8 times fewer memory accesses, which reduces energy consumption drastically when deployed on optimized hardware (Hubara et al., 2016). However, many open research questions remain until the use of BNNs and other extremely quantized neural networks becomes widespread in industry. larq is an ecosystem of Python packages for BNNs and other Quantized Neural Networks (QNNs). It is intended to facilitate researchers to resolve these outstanding questions.

Journal ArticleDOI
TL;DR: pyOptSparse is an optimization framework designed for constrained nonlinear optimization of large sparse problems and provides a unified interface for various gradient-free and gradientbased optimizers by using an object-oriented approach.
Abstract: pyOptSparse is an optimization framework designed for constrained nonlinear optimization of large sparse problems and provides a unified interface for various gradient-free and gradientbased optimizers. By using an object-oriented approach, the software maintains independence between the optimization problem formulation and the implementation of the specific optimizers. The code is MPI-wrapped to enable execution of expensive parallel analyses and gradient evaluations, such as when using computational fluid dynamics (CFD) simulations, which can require hundreds of processors. The optimization history can be stored in a database file, which can then be used both for post-processing and restarting another optimization. A graphical user interface application is provided to visualize the optimization history interactively.

Journal ArticleDOI
TL;DR: R users are proposed to receive an R package that wraps the Earth Engine Python API to provide R users with a familiar interface, rapid development features, and flexibility to analyze data using open-source, third-party packages.
Abstract: Google Earth Engine (Gorelick et al., 2017) is a cloud computing platform designed for planetary-scale environmental data analysis. Its multi-petabyte data catalog and computation services are accessed via an Internet-accessible API. The API is exposed through JavaScript and Python client libraries. Google provides a browser-based IDE for the JavaScript API, and while convenient and useful for rapid data exploration and script development, it does not allow third-party package integration, relying solely on Google Maps and Google Charts for data visualization, and proprietary systems for metadata viewing and asset management. In contrast, the Python and Node.js distributions offer much flexibility for developers to integrate with third-party libraries. However, without the structure of a dedicated IDE, casual users can be left directionless and daunted. A significant gap exists between these two offerings (Google-supported JavaScript IDE and base client libraries) where convenience and flexibility meet. We propose to fill this gap with an R package that wraps the Earth Engine Python API to provide R users with a familiar interface, rapid development features, and flexibility to analyze data using open-source, third-party packages.

Journal ArticleDOI
TL;DR: Time integration techniques continue to be an active area of research and include backward difference formulas and Runge-Kutta methods and common spatial discretization approaches include the finite difference method, finite volume method, and finite element method.
Abstract: Differential equations emerge in various scientific and engineering domains for modeling physical phenomena. Most differential equations of practical interest are analytically intractable. Traditionally, differential equations are solved by numerical methods. Sophisticated algorithms exist to integrate differential equations in time and space. Time integration techniques continue to be an active area of research and include backward difference formulas and Runge-Kutta methods (Conde, Gottlieb, Grant, & Shadid, 2017). Common spatial discretization approaches include the finite difference method (FDM), finite volume method (FVM), and finite element method (FEM) as well as spectral methods such as the Fourier-spectral method. These classical methods have been studied in detail and much is known about their convergence properties. Moreover, highly optimized codes exist for solving differential equations of practical interest with these techniques (Seefeldt et al., 2017; Smith & Abeysinghe, 2017). While these methods are efficient and well-studied, their expressibility is limited by their function representation.

Journal ArticleDOI
TL;DR: In this paper, a peridynamics EMU nodal discretization implementation with the C++ Standard Library for Concurrency and Parallelism (HPX), an open source asynchronous many task run time system, is presented.
Abstract: Peridynamics is a non-local generalization of continuum mechanics tailored to address discontinuous displacement fields arising in fracture mechanics. As many non-local approaches, peridynamics requires considerable computing resources to solve practical problems. Several implementations of peridynamics utilizing CUDA, OpenCL, and MPI were developed to address this important issue. On modern supercomputers, asynchronous many task systems are emerging to address the new architecture of computational nodes. This paper presents a peridynamics EMU nodal discretization implementation with the C++ Standard Library for Concurrency and Parallelism (HPX), an open source asynchronous many task run time system. The code is designed for modular expandability, so as to simplify it to extend with new material models or discretizations. The code is convergent for implicit time integration and recovers theoretical solutions. Explicit time integration, convergence results are presented to showcase the agreement of results with theoretical claims in previous works. Two benchmark tests on code scalability are applied demonstrating agreement between this code's scalability and theoretical estimations.

Journal ArticleDOI
TL;DR: Impaired.py is a community-driven Python package for making the analysis of electrochemical impedance spectroscopy (EIS) data easier and more reproducible.
Abstract: License Authors of papers retain copyright and release the work under a Creative Commons Attribution 4.0 International License (CC BY 4.0). impedance.py is a community-driven Python package for making the analysis of electrochemical impedance spectroscopy (EIS) data easier and more reproducible. impedance.py currently provides several useful features commonly used in the typical impedance analysis workflow:

Journal ArticleDOI
TL;DR: The Gridap library provides a feature-rich set of discretization techniques, including continuous and discontinuous FE methods with Lagrangian, RaviartThomas, or Nédélec interpolations, and supports a wide range of problem types including linear, nonlinear, single-field, and multi-field PDEs.
Abstract: Gridap is a new Finite Element (FE) framework, exclusively written in the Julia programming language, for the numerical simulation of a wide range of mathematical models governed by partial differential equations (PDEs). The library provides a feature-rich set of discretization techniques, including continuous and discontinuous FE methods with Lagrangian, RaviartThomas, or Nédélec interpolations, and supports a wide range of problem types including linear, nonlinear, single-field, and multi-field PDEs (see (Badia, Martín, & Principe, 2018, Section 3) for a detailed presentation of the mathematical abstractions behind the implementation of these FE methods). Gridap is designed to help application experts to easily simulate real-world problems, to help researchers improve productivity when developing new FE-related techniques, and also for its usage in numerical PDE courses.

Journal ArticleDOI
TL;DR: In urban and developed areas, effective stormwater management that routes and detains stormwater helps to mitigate these impacts and improve water quality.
Abstract: Stormwater management seeks to reduce runoff from rain or melted snow and improve water quality. Where it can absorb into soil, runoff is filtered and returns to streams, rivers, and aquifers, but in developed areas, precipitation often cannot soak into the ground because impervious surfaces (e.g., pavement, buildings), and already saturated soils can create excess runoff. This water, which can contain pollutants, then runs across urban surfaces and into storm drains, drainage ditches, and sewer systems. Stormwater runoff can cause flooding, erosion, infrastructure and habitat damage, and contamination (including combined and sanitary sewer overflows). In urban and developed areas, effective stormwater management that routes and detains stormwater helps to mitigate these impacts and improve water quality.

Journal ArticleDOI
TL;DR: This workalyses to determine differences in the central tendency, e.g., mean or median values, are an important application of statistics and often comparisons must be done with paired samples, i.e., populations that are not dependent on each other.
Abstract: Analyses to determine differences in the central tendency, e.g., mean or median values, are an important application of statistics. Often, such comparisons must be done with paired samples, i.e., populations that are not dependent on each other. This is, for example, required if the performance different machine learning algorithms should be compared on multiple data sets. The performance measures on each data set are then the paired samples, the difference in the central tendency can be used to rank the different algorithms. This problem is not new and how such tests could be done was already described in the well-known article by Demšar (2006).

Journal ArticleDOI
TL;DR: Fergus R Cooper1, Ruth E Baker1, Miguel O Bernabeu2, Rafel Bordas3, Louise Bowler3, Alfonso Bueno-Orovio3, Helen M Byrne1, Valentina Carapella4, Louie Cardone-Noott3, Jonathan Cooper5, Sara Dutta3, Benjamin D Evans6, 7, Alexander G Fletcher8, 9, James A Grogan1, Wenxian Guo10, Daniel G
Abstract: Funding: UK Engineering and Physical Sciences Research Council [grant number EP/N509711/1 (J.K.)].

Journal ArticleDOI
TL;DR: This research presents a probabilistic simulation of the impact of asteroid impacts on the Earth’s rotation and reflects the efforts of scientists and engineers to design and test spacecraft for this type of mission.
Abstract: 1 Johns Hopkins University 2 None 3 Jet Propulsion Laboratory, California Institute of Technology 4 Latchmoor Services, LLC 5 Planetary Transportation Systems GmbH 6 USGS Astrogeology Science Center 7 Institute of Experimental and Applied Physics, University of Kiel 8 DLR Gesellschaft für Raumfahrtanwendungen (GfR) mbH 9 ODC Space 10 Laboratory for Atmospheric and Space Physics, University of Colorado 11 ETH Zurich 12 Planetary Science Institute 13 Collins Aerospace 14 GFD Dennou Club DOI: 10.21105/joss.02050

Journal ArticleDOI
TL;DR: The finite element method (FEM) is a flexible computational technique for the discretization and solution of PDEs, especially in the case of complex spatial domains.
Abstract: Partial differential equations (PDEs)—such as the Navier–Stokes equations in fluid mechanics, the Maxwell equations in electromagnetism, and the Schrödinger equation in quantum mechanics—are the basic building blocks of modern physics and engineering. The finite element method (FEM) is a flexible computational technique for the discretization and solution of PDEs, especially in the case of complex spatial domains.

Journal ArticleDOI
TL;DR: Exploratory factor analysis (EFA) is a data-driven approach to factor analysis and is used to extract a smaller number of common factors that represent or explain the common variance of a larger set of manifest variables.
Abstract: In the social sciences, factor analysis is a widely used tool to identify latent constructs underlying task performance or the answers to questionnaire items. Exploratory factor analysis (EFA) is a data-driven approach to factor analysis and is used to extract a smaller number of common factors that represent or explain the common variance of a larger set of manifest variables (see, e.g., Watkins, 2018 for an overview). Several decisions have to be made in advance when performing an EFA, including the number of factors to extract, and the extraction and rotation method to be used. After a factor solution has been found, it is useful to subject the resulting factor solution to an orthogonalization procedure to achieve a hierarchical factor solution with one general and several specific factors. This situation especially applies to data structures in the field of intelligence research where usually high, positive factor intercorrelations occur. From this orthogonalized, hierarchical solution, the variance can then be partitioned to estimate the relative importance of the general versus the specific factors using omega reliability coefficients (e.g., McDonald, 1999).

Journal ArticleDOI
TL;DR: Partial differential equations play a central role in describing the dynamics of physical systems in research and in practical applications, but equations appearing in realistic scenarios are typically non-linear and analytical solutions rarely exist.
Abstract: Partial differential equations (PDEs) play a central role in describing the dynamics of physical systems in research and in practical applications. However, equations appearing in realistic scenarios are typically non-linear and analytical solutions rarely exist. Instead, such systems are solved by numerical integration to provide insight into their behavior. Moreover, such investigations can motivate approximative solutions, which might then lead to analytical insight.

Journal ArticleDOI
TL;DR: MyQueue is a front-end for schedulers that makes handling of tasks easy and has a command-line interface called mq with a number of sub-commands and a Python interface for managing workflows.
Abstract: Task scheduling and workload management on high-performance computing environments is usually done with tools such as SLURM (Jette, Yoo, & Grondona, 2002). MyQueue is a front-end for schedulers that makes handling of tasks easy. It has a command-line interface called mq with a number of sub-commands and a Python interface for managing workflows. Currently, the following schedulers are supported: SLURM, PBS, and LSF.

Journal ArticleDOI
TL;DR: This work has shown that no existing tools provide linked views of the protein structure and DMS data in a single interface to facilitate dynamic data exploration and sharing.
Abstract: The high-throughput technique of deep mutational scanning (DMS) has recently made it possible to experimentally measure the effects of all amino-acid mutations to a protein. Over the past five years, this technique has been used to study dozens of different proteins and answer a variety of research questions. For example, DMS has been used for protein engineering, understanding the human immune response to viruses, and interpreting human variation in a clinical setting. Accompanying this proliferation of DMS studies has been the development of software tools and databases for data analysis and sharing. However, for many purposes it is important to also integrate and visualize the DMS data in the context of other information, such as the 3-D protein structure or natural sequence-variation data. Here we describe dms-view (https://dms-view.github.io/), a flexible, web-based, interactive visualization tool for DMS data. dms-view is written in JavaScript and D3, and links site-level and mutation-level DMS data to a 3-D protein structure. The user can interactively select sites of interest to examine the DMS measurements in the context of the protein structure. dms-view tracks the input data and user selections in the URL, making it possible to save specific views of interactively generated visualizations to share with collaborators or to support a published study. Importantly, dms-view takes a flexible input data file so users can easily visualize their own DMS data in the context of protein structures of their choosing, and also incorporate additional information such amino-acid frequencies in natural alignments.

Journal ArticleDOI
TL;DR: In particular, large collected datasets contain data acquired using differentinstruments and measurement conditions, and can further contain a significant fraction ofinconsistent, wrongly labeled, or incorrect metadata (annotations).
Abstract: Mass spectrometry data is at the heart of numerous applications in the biomedical and lifesciences. With growing use of high-throughput techniques, researchers need to analyze largerand more complex datasets. In particular through joint effort in the research community,fragmentation mass spectrometry datasets are growing in size and number. Platforms such asMassBank (Horai et al., 2010), GNPS (Wang et al., 2016) or MetaboLights (Haug et al., 2020)serve as an open-access hub for sharing of raw, processed, or annotated fragmentation massspectrometry data. Without suitable tools, however, exploitation of such datasets remainsoverly challenging. In particular, large collected datasets contain data acquired using differentinstruments and measurement conditions, and can further contain a significant fraction ofinconsistent, wrongly labeled, or incorrect metadata (annotations).