scispace - formally typeset
Search or ask a question
Journal ArticleDOI

ggmap: Spatial Visualization with ggplot2

01 Jan 2013-R Journal (The R Foundation)-Vol. 5, Iss: 1, pp 144-161
TL;DR: This article details some new methods for the visualization of spatial data in R using the layered grammar of graphics implementation of ggplot2 in conjunction with the contextual information of static maps from Google Maps, OpenStreetMap, Stamen Maps or CloudMade Maps and presents an overview of a few utility functions.
Abstract: In spatial statistics the ability to visualize data and models superimposed with their basic social landmarks and geographic context is invaluable. ggmap is a new tool which enables such visualization by combining the spatial information of static maps from Google Maps, OpenStreetMap, Stamen Maps or CloudMade Maps with the layered grammar of graphics implementation of ggplot2. In addition, several new utility functions are introduced which allow the user to access the Google Geocoding, Distance Matrix, and Directions APIs. The result is an easy, consistent and modular framework for spatial graphics with several convenient tools for spatial data analysis. Introduction Visualizing spatial data in R can be a challenging task. Fortunately the task is made a good deal easier by the data structures and plot methods of sp, RgoogleMaps, and related packages (Pebesma and Bivand, 2006; Bivand et al., 2008; Loecher and Berlin School of Economics and Law, 2013). Using those methods, one can plot the basic geographic information of (for instance) a shape file containing polygons for areal data or points for point referenced data. However, compared to specialized geographic information systems (GISs) such as ESRI’s ArcGIS, which can plot points, polygons, etc. on top of maps and satellite imagery with drag-down menus, these visualizations can be pretty disappointing. This article details some new methods for the visualization of spatial data in R using the layered grammar of graphics implementation of ggplot2 in conjunction with the contextual information of static maps from Google Maps, OpenStreetMap, Stamen Maps or CloudMade Maps (Wickham, 2009, 2010). The result is an easy to use R package named ggmap. After describing the nuts and bolts of ggmap, we showcase some of its capabilities in a simple case study concerning violent crimes in downtown Houston, Texas and present an overview of a few utility functions. Plotting spatial data in R Areal data is data which corresponds to geographical extents with polygonal boundaries. A typical example is the number of residents per zip code. Considering only the boundaries of the areal units, we are used to seeing areal plots in R which resemble those in Figure 1 (left). -96.0 -95.5 -95.0 -94.5 29 .0 29 .5 30 .0 30 .5 longitude la tit ud e -96.0 -95.5 -95.0 -94.5 29 .0 29 .5 30 .0 30 .5 longitude la tit ud e Figure 1: A typical R areal plot – zip codes in the Greater Houston area (left), and a typical R spatial scatterplot – murders in Houston from January 2010 to August 2010 (right). While these kinds of plots are useful, they are not as informative as we would like in many situations. For instance, when plotting zip codes it is helpful to also see major roads and other landmarks which form the boundaries of areal units. The situation for point referenced spatial data is often much worse. Since we can’t easily contextualize a scatterplot of points without any background information at all, it is common to add points as The R Journal Vol. 5/1, June ISSN 2073-4859 CONTRIBUTED RESEARCH ARTICLES 145 an overlay of some areal data—whatever areal data is available. The resulting plot looks like Figure 1 (right). In most cases the plot is understandable to the researcher who has worked on the problem for some time but is of hardly any use to his audience, who must work to associate the data of interest with their location. Moreover, it leaves out many practical details—are most of the events to the east or west of landmark x? Are they clustered around more well-to-do parts of town, or do they tend to occur in disadvantaged areas? Questions like these can’t really be answered using these kinds of graphics because we don’t think in terms of small scale areal boundaries (e.g. zip codes or census tracts). With a little effort better plots can be made, and tools such as maps, maptools, sp, or RgoogleMaps make the process much easier; in fact, RgoogleMaps was the inspiration for ggmap (Becker et al., 2013; Bivand and Lewin-Koh, 2013). Moreover, there has recently been a deluge of interest in the subject of mapmaking in R—Ian Fellows’ excellent interactive GUI-driven DeducerSpatial package based on Bing Maps comes to mind (Fellows et al., 2013). ggmap takes another step in this direction by situating the contextual information of various kinds of static maps in the ggplot2 plotting framework. The result is an easy, consistent way of specifying plots which are readily interpretable by both expert and audience and safeguarded from graphical inconsistencies by the layered grammar of graphics framework. The result is a spatial plot resembling Figure 2. Note that map images and information in this work may appear slightly different due to map provider changes over time. murder

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: The book describes clearly and intuitively the differences between exploratory and confirmatory factor analysis, and discusses how to construct, validate, and assess the goodness of fit of a measurement model in SEM by confirmatory factors analysis.
Abstract: Examples are discussed to show the differences among discriminant analysis, logistic regression, and multiple regression. Chapter 6, “Multivariate Analysis of Variance,” presents advantages of multivariate analysis of variance (MANOVA) over univariate analysis of variance (ANOVA), discusses assumptions of MANOVA, and assesses validations of MANOVA assumptions and model estimation. The authors also discuss post hoc tests of MANOVA and multivariate analysis of covariance. Chapter 7, “Conjoint Analysis,” explains what conjoint analysis does and how it is different from other multivariate techniques. Guidelines of selecting attributes, models, and methods of data collection are presented. Chapter 8, “Cluster Analysis,” studies objectives, roles, and limitations of cluster analysis. Two basic concepts: similarity and distance are discussed. The authors also discuss details of five most popular hierarchical algorithms (singlelinkage, complete-linkage, average-linkage, centroid method, Ward’s method) and three nonhierarchical algorithms (the sequential threshold method, the parallel threshold method, and the optimizing procedure). Profiles of clusters and guidelines for cluster validation are studied as well. Chapter 9, “Multidimensional Scaling and Correspondence Analysis,” introduces two interdependence techniques to display the relationships in the data. The book describes clearly and intuitively the differences between the two techniques and how these two techniques are performed. Chapters 10–12 cover topics in SEM. Chapter 10, “Structural Equation Modeling: An Introduction,” introduces SEM and related concepts such as exogenous, endogenous constructs, and so on, points out the differences between SEM and other multivariate techniques, overviews the decision process of SEM. Chapter 11, “Confirmatory Factor Analysis,” explains the differences between exploratory and confirmatory factor analysis, discusses how to construct, validate, and assess the goodness of fit of a measurement model in SEM by confirmatory factor analysis. Chapter 12, “Testing a Structural Model,” presents some methods of SEM in examining the relationships between latent constructs. The book is an excellent book for people in management and marketing. For the Technometrics audience, this book does not have much flavor of physical, chemical, and engineering sciences. For example, partial least squares, a very popular method in Chemometrics, is discussed but not as detailed as other techniques in the book. Furthermore, due to the amount of materials covered in the book, it might be inappropriate for someone who is new to multivariate analysis.

497 citations

Journal ArticleDOI
TL;DR: In this paper, the authors address issues with data quality, especially incorrect species occurrence records from online databases, which are indispensable resources in ecological, biogeographical and palaeontological research.
Abstract: Species occurrence records from online databases are an indispensable resource in ecological, biogeographical and palaeontological research. However, issues with data quality, especially incorrect ...

379 citations


Cites methods from "ggmap: Spatial Visualization with g..."

  • ...…International, 2017; GeoNames, 2017; Global Biodiveristy Information Facility, 2017; Index Herbariorum, 2017; The Global Registry of Biodiversity Repositories, 2017; Wikipedia, 2017) and geo- referenced them using the ggmap and openCage R- packages (Kahle & Wickham, 2013; Salmon, 2017)....

    [...]

Journal ArticleDOI
TL;DR: Densities of microplastics in Lake Winnipeg are reported for the first time, adding to the growing evidence that microplastic contamination is widespread even around sparsely-populated freshwater ecosystems, and provides a baseline for future study and risk assessments.

285 citations


Cites methods from "ggmap: Spatial Visualization with g..."

  • ...Visualization of microplastic density in the lake was mapped using the packages ‘ggmap’ and ‘ggplot2’ (Kahle and Wickham, 2013; Wickham, 2009)....

    [...]

Journal ArticleDOI
TL;DR: This work characterized and controlled for diversity in genome-wide association studies (GWASs) for the Hispanic Community Health Study/Study of Latinos and simultaneously estimated population-structure principal components robust to familial relatedness and pairwise kinship coefficients robust to population structure, admixture, and Hardy-Weinberg departures.
Abstract: US Hispanic/Latino individuals are diverse in genetic ancestry, culture, and environmental exposures. Here, we characterized and controlled for this diversity in genome-wide association studies (GWASs) for the Hispanic Community Health Study/Study of Latinos (HCHS/SOL). We simultaneously estimated population-structure principal components (PCs) robust to familial relatedness and pairwise kinship coefficients (KCs) robust to population structure, admixture, and Hardy-Weinberg departures. The PCs revealed substantial genetic differentiation within and among six self-identified background groups (Cuban, Dominican, Puerto Rican, Mexican, and Central and South American). To control for variation among groups, we developed a multi-dimensional clustering method to define a "genetic-analysis group" variable that retains many properties of self-identified background while achieving substantially greater genetic homogeneity within groups and including participants with non-specific self-identification. In GWASs of 22 biomedical traits, we used a linear mixed model (LMM) including pairwise empirical KCs to account for familial relatedness, PCs for ancestry, and genetic-analysis groups for additional group-associated effects. Including the genetic-analysis group as a covariate accounted for significant trait variation in 8 of 22 traits, even after we fit 20 PCs. Additionally, genetic-analysis groups had significant heterogeneity of residual variance for 20 of 22 traits, and modeling this heteroscedasticity within the LMM reduced genomic inflation for 19 traits. Furthermore, fitting an LMM that utilized a genetic-analysis group rather than a self-identified background group achieved higher power to detect previously reported associations. We expect that the methods applied here will be useful in other studies with multiple ethnic groups, admixture, and relatedness.

234 citations

Journal ArticleDOI
TL;DR: In this paper, the authors used statistical postprocessing of systematic errors to obtain reliable and accurate probabilistic forecasts for ensemble weather predictions. But, this is done with distri...
Abstract: Ensemble weather predictions require statistical postprocessing of systematic errors to obtain reliable and accurate probabilistic forecasts. Traditionally, this is accomplished with distri...

220 citations

References
More filters
Book
13 Aug 2009
TL;DR: This book describes ggplot2, a new data visualization package for R that uses the insights from Leland Wilkisons Grammar of Graphics to create a powerful and flexible system for creating data graphics.
Abstract: This book describes ggplot2, a new data visualization package for R that uses the insights from Leland Wilkisons Grammar of Graphics to create a powerful and flexible system for creating data graphics. With ggplot2, its easy to: produce handsome, publication-quality plots, with automatic legends created from the plot specification superpose multiple layers (points, lines, maps, tiles, box plots to name a few) from different data sources, with automatically adjusted common scales add customisable smoothers that use the powerful modelling capabilities of R, such as loess, linear models, generalised additive models and robust regression save any ggplot2 plot (or part thereof) for later modification or reuse create custom themes that capture in-house or journal style requirements, and that can easily be applied to multiple plots approach your graph from a visual perspective, thinking about how each component of the data is represented on the final plot. This book will be useful to everyone who has struggled with displaying their data in an informative and attractive way. You will need some basic knowledge of R (i.e. you should be able to get your data into R), but ggplot2 is a mini-language specifically tailored for producing graphics, and youll learn everything you need in the book. After reading this book youll be able to produce graphics customized precisely for your problems,and youll find it easy to get graphics out of your head and on to the screen or page.

29,504 citations


"ggmap: Spatial Visualization with g..." refers methods in this paper

  • ...CloudMade Maps takes the tile styling even further by allowing the user to either (1) select among thousands of user-made sets or (2) create an entirely new style with a simple online editor where the user can specify colors, lines, and so forth for various types of roads, waterways, landmarks, etc., all of which are generated by CloudMade and accessible in ggmap. ggmap, through get_map (or get_cloudmademap) allows for both options....

    [...]

  • ...Style is where Stamen Maps and CloudMade Maps really shine....

    [...]

  • ...The new osmar package integrates R and the OpenStreetMap data structures with which OpenStreetMap maps, Stamen Maps, and CloudMade Maps are rendered, thereby opening a floodgate of possibilities for plotting geographic objects on top of maps or satellite imagery all within R using ggmap (Eugster and Schlesinger, 2013)....

    [...]

  • ...The one minor drawback to using CloudMade Maps is that the user must register with CloudMade to obtain an API key and then pass the API key into get_map with the api_key argument....

    [...]

  • ...Tile style – the source and maptype arguments of get_map The most attractive aspect of using different map sources (Google Maps, OpenStreetMap, Stamen Maps, and CloudMade Maps) is the different map styles provided by the producer....

    [...]

Journal ArticleDOI
TL;DR: This paper gives rise to a new R package that allows you to smoothly apply a split-apply-combine strategy, without having to worry about the type of structure in which your data is stored.
Abstract: Many data analysis problems involve the application of a split-apply-combine strategy, where you break up a big problem into manageable pieces, operate on each piece independently and then put all the pieces back together. This insight gives rise to a new R package that allows you to smoothly apply this strategy, without having to worry about the type of structure in which your data is stored. The paper includes two case studies showing how these insights make it easier to work with batting records for veteran baseball players and a large 3d array of spatio-temporal ozone measurements.

2,243 citations


"ggmap: Spatial Visualization with g..." refers methods in this paper

  • ...The data were lightly cleaned and aggregated using plyr (Wickham, 2011) and geocoded using Google Maps (to the center of the block, e....

    [...]

  • ...The data were lightly cleaned and aggregated using plyr (Wickham, 2011) and geocoded using Google Maps (to the center of the block, e.g., 6150 Main St.); the full data set is available in ggmap as the data set crime....

    [...]

Book
21 Oct 2008
TL;DR: Hello, world: handling spatial data in R.
Abstract: Hello, world: handling spatial data in R.- Classes for spatial data in R.- Visualizing spatial data.- Spatial data import and export.- Further methods for handling spatial data.- Customising spatial data classes and methods.- Spatial point pattern analysis.- Interpolation and geostatistics.- Areal data and spatial autocorrelation.- Modelling areal data.- Disease mapping.- Afterword.- References.

2,105 citations

18 Oct 2015

1,493 citations


"ggmap: Spatial Visualization with g..." refers methods in this paper

  • ...Fortunately the task is made a good deal easier by the data structures and plot methods of sp, RgoogleMaps, and related packages (Pebesma and Bivand, 2006; Bivand et al., 2008; Loecher and Berlin School of Economics and Law, 2013)....

    [...]

Journal ArticleDOI
01 Jun 2013

1,136 citations


"ggmap: Spatial Visualization with g..." refers methods in this paper

  • ...Fortunately the task is made a good deal easier by the data structures and plot methods of sp, RgoogleMaps, and related packages (Pebesma and Bivand, 2006; Bivand et al., 2008; Loecher and Berlin School of Economics and Law, 2013)....

    [...]