scispace - formally typeset
Search or ask a question
JournalISSN: 2073-4859

R Journal 

The R Foundation
About: R Journal is an academic journal published by The R Foundation. The journal publishes majorly in the area(s): Computer science & Multivariate statistics. It has an ISSN identifier of 2073-4859. It is also open access. Over the lifetime, 580 publications have been published receiving 29924 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: The glmmTMB package fits many types of GLMMs and extensions, including models with continuously distributed responses, but here the authors focus on count responses and its ability to estimate the Conway-Maxwell-Poisson distribution parameterized by the mean is unique.
Abstract: Count data can be analyzed using generalized linear mixed models when observations are correlated in ways that require random effects However, count data are often zero-inflated, containing more zeros than would be expected from the typical error distributions We present a new package, glmmTMB, and compare it to other R packages that fit zero-inflated mixed models The glmmTMB package fits many types of GLMMs and extensions, including models with continuously distributed responses, but here we focus on count responses glmmTMB is faster than glmmADMB, MCMCglmm, and brms, and more flexible than INLA and mgcv for zero-inflated modeling One unique feature of glmmTMB (among packages that fit zero-inflated mixed models) is its ability to estimate the Conway-Maxwell-Poisson distribution parameterized by the mean Overall, its most appealing features for new users may be the combination of speed, flexibility, and its interface’s similarity to lme4

4,497 citations

Journal ArticleDOI
TL;DR: This updated version of mclust adds new covariance structures, dimension reduction capabilities for visualisation, model selection criteria, initialisation strategies for the EM algorithm, and bootstrap-based inference, making it a full-featured R package for data analysis via finite mixture modelling.
Abstract: Finite mixture models are being used increasingly to model a wide variety of random phenomena for clustering, classification and density estimation. mclust is a powerful and popular package which allows modelling of data as a Gaussian finite mixture with different covariance structures and different numbers of mixture components, for a variety of purposes of analysis. Recently, version 5 of the package has been made available on CRAN. This updated version adds new covariance structures, dimension reduction capabilities for visualisation, model selection criteria, initialisation strategies for the EM algorithm, and bootstrap-based inference, making it a full-featured R package for data analysis via finite mixture modelling.

1,841 citations

Journal ArticleDOI
TL;DR: The sf package implements simple features in R, and has roughly the same capacity for spatial vector data as packages sp, rgeos and rgdal, and its place in the R package ecosystem, and the potential to connect R to other computer systems are described.
Abstract: Simple features are a standardized way of encoding spatial vector data (points, lines, polygons) in computers. The sf package implements simple features in R, and has roughly the same capacity for spatial vector data as packages sp, rgeos and rgdal. We describe the need for this package, its place in the R package ecosystem, and its potential to connect R to other computer systems. We illustrate this with examples of its use. What are simple features? Features can be thought of as “things” or objects that have a spatial location or extent; they may be physical objects like a building, or social conventions like a political state. Feature geometry refers to the spatial properties (location or extent) of a feature, and can be described by a point, a point set, a linestring, a set of linestrings, a polygon, a set of polygons, or a combination of these. The simple adjective of simple features refers to the property that linestrings and polygons are built from points connected by straight line segments. Features typically also have other properties (temporal properties, color, name, measured quantity), which are called feature attributes. Not all spatial phenomena are easy to represent by “things or objects”: continuous phenoma such as water temperature or elevation are better represented as functions mapping from continuous or sampled space (and time) to values (Scheider et al., 2016), and are often represented by raster data rather than vector (points, lines, polygons) data. Simple feature access (Herring, 2011) is an international standard for representing and encoding spatial data, dominantly represented by point, line and polygon geometries (ISO, 2004). It is widely used e.g. by spatial databases (Herring, 2010), GeoJSON (Butler et al., 2016), GeoSPARQL (Perry and Herring, 2012), and open source libraries that empower the open source geospatial software landscape including GDAL (Warmerdam, 2008), GEOS (GEOS Development Team, 2017) and liblwgeom (a PostGIS component, Obe and Hsu (2015)). The need for a new package Package sf (Pebesma, 2017) is an R package for reading, writing, handling and manipulating simple features in R, reimplementing the vector (points, lines, polygons) data handling functionality of packages sp (Pebesma and Bivand, 2005; Bivand et al., 2013), rgdal (Bivand et al., 2017) and rgeos (Bivand and Rundel, 2017). However, sp has some 400 direct reverse dependencies, and a few thousand indirect ones. Why was there a need to write a package with the potential to replace it? First of all, at the time of writing sp (2003) there was no standard for simple features, and the ESRI shapefile was by far the dominant file format for exchanging vector data. The lack of a clear (open) standard for shapefiles, the omnipresence of “bad” or malformed shapefiles, and the many limitations of the ways it can represent spatial data adversely affected sp, for instance in the way it represents holes in polygons, and a lack of discipline to register holes with their enclosing outer ring. Such ambiguities could influence plotting of data, or communication with other systems or libraries. The simple feature access standard is now widely adopted, but the sp package family has to make assumptions and do conversions to load them into R. This means that you cannot round-trip data, as of: loading data in R, manipulating them, exporting them and getting the same geometries back. With sf, this is no longer a problem. A second reason was that external libraries heavily used by R packages for reading and writing spatial data (GDAL) and for geometrical operations (GEOS) have developed stronger support for the simple feature standard. A third reason was that the package cluster now known as the tidyverse (Wickham, 2017, 2014), which includes popular packages such as dplyr (Wickham et al., 2017) and ggplot2 (Wickham, 2016), does not work well with the spatial classes of sp: • tidyverse packages assume objects not only behave like data.frames (which sp objects do by providing methods), but are data.frames in the sense of being a list with equally sized column vectors, which sp does not do. The R Journal Vol. XX/YY, AAAA 20ZZ ISSN 2073-4859 CONTRIBUTED RESEARCH ARTICLE 2 • attempts to “tidy” polygon objects for plotting with ggplot2 (“fortify”) by creating data.frame objects with records for each polygon node (vertex) were neither robust nor efficient. A simple (S3) way to store geometries in data.frame or similar objects is to put them in a geometry list-column, where each list element contains the geometry object of the corresponding record, or data.frame “row”; this works well with the tidyverse package family.

1,625 citations

Journal ArticleDOI
TL;DR: This article details some new methods for the visualization of spatial data in R using the layered grammar of graphics implementation of ggplot2 in conjunction with the contextual information of static maps from Google Maps, OpenStreetMap, Stamen Maps or CloudMade Maps and presents an overview of a few utility functions.
Abstract: In spatial statistics the ability to visualize data and models superimposed with their basic social landmarks and geographic context is invaluable. ggmap is a new tool which enables such visualization by combining the spatial information of static maps from Google Maps, OpenStreetMap, Stamen Maps or CloudMade Maps with the layered grammar of graphics implementation of ggplot2. In addition, several new utility functions are introduced which allow the user to access the Google Geocoding, Distance Matrix, and Directions APIs. The result is an easy, consistent and modular framework for spatial graphics with several convenient tools for spatial data analysis. Introduction Visualizing spatial data in R can be a challenging task. Fortunately the task is made a good deal easier by the data structures and plot methods of sp, RgoogleMaps, and related packages (Pebesma and Bivand, 2006; Bivand et al., 2008; Loecher and Berlin School of Economics and Law, 2013). Using those methods, one can plot the basic geographic information of (for instance) a shape file containing polygons for areal data or points for point referenced data. However, compared to specialized geographic information systems (GISs) such as ESRI’s ArcGIS, which can plot points, polygons, etc. on top of maps and satellite imagery with drag-down menus, these visualizations can be pretty disappointing. This article details some new methods for the visualization of spatial data in R using the layered grammar of graphics implementation of ggplot2 in conjunction with the contextual information of static maps from Google Maps, OpenStreetMap, Stamen Maps or CloudMade Maps (Wickham, 2009, 2010). The result is an easy to use R package named ggmap. After describing the nuts and bolts of ggmap, we showcase some of its capabilities in a simple case study concerning violent crimes in downtown Houston, Texas and present an overview of a few utility functions. Plotting spatial data in R Areal data is data which corresponds to geographical extents with polygonal boundaries. A typical example is the number of residents per zip code. Considering only the boundaries of the areal units, we are used to seeing areal plots in R which resemble those in Figure 1 (left). -96.0 -95.5 -95.0 -94.5 29 .0 29 .5 30 .0 30 .5 longitude la tit ud e -96.0 -95.5 -95.0 -94.5 29 .0 29 .5 30 .0 30 .5 longitude la tit ud e Figure 1: A typical R areal plot – zip codes in the Greater Houston area (left), and a typical R spatial scatterplot – murders in Houston from January 2010 to August 2010 (right). While these kinds of plots are useful, they are not as informative as we would like in many situations. For instance, when plotting zip codes it is helpful to also see major roads and other landmarks which form the boundaries of areal units. The situation for point referenced spatial data is often much worse. Since we can’t easily contextualize a scatterplot of points without any background information at all, it is common to add points as The R Journal Vol. 5/1, June ISSN 2073-4859 CONTRIBUTED RESEARCH ARTICLES 145 an overlay of some areal data—whatever areal data is available. The resulting plot looks like Figure 1 (right). In most cases the plot is understandable to the researcher who has worked on the problem for some time but is of hardly any use to his audience, who must work to associate the data of interest with their location. Moreover, it leaves out many practical details—are most of the events to the east or west of landmark x? Are they clustered around more well-to-do parts of town, or do they tend to occur in disadvantaged areas? Questions like these can’t really be answered using these kinds of graphics because we don’t think in terms of small scale areal boundaries (e.g. zip codes or census tracts). With a little effort better plots can be made, and tools such as maps, maptools, sp, or RgoogleMaps make the process much easier; in fact, RgoogleMaps was the inspiration for ggmap (Becker et al., 2013; Bivand and Lewin-Koh, 2013). Moreover, there has recently been a deluge of interest in the subject of mapmaking in R—Ian Fellows’ excellent interactive GUI-driven DeducerSpatial package based on Bing Maps comes to mind (Fellows et al., 2013). ggmap takes another step in this direction by situating the contextual information of various kinds of static maps in the ggplot2 plotting framework. The result is an easy, consistent way of specifying plots which are readily interpretable by both expert and audience and safeguarded from graphical inconsistencies by the layered grammar of graphics framework. The result is a spatial plot resembling Figure 2. Note that map images and information in this work may appear slightly different due to map provider changes over time. murder

1,541 citations

Journal ArticleDOI
TL;DR: Brms provides an intuitive and powerful formula syntax, which extends the well known formula syntax of lme4, which is introduced in detail and demonstrated its usefulness with four examples, each showing other relevant aspects of the syntax.
Abstract: The brms package allows R users to easily specify a wide range of Bayesian single-level and multilevel models, which are fitted with the probabilistic programming language Stan behind the scenes. Several response distributions are supported, of which all parameters (e.g., location, scale, and shape) can be predicted at the same time thus allowing for distributional regression. Non-linear relationships may be specified using non-linear predictor terms or semi-parametric approaches such as splines or Gaussian processes. To make all of these modeling options possible in a multilevel framework, brms provides an intuitive and powerful formula syntax, which extends the well known formula syntax of lme4. The purpose of the present paper is to introduce this syntax in detail and to demonstrate its usefulness with four examples, each showing other relevant aspects of the syntax.

1,463 citations

Performance
Metrics
No. of papers from the Journal in previous years
YearPapers
202323
202261
202129
202047
201986
201838