scispace - formally typeset
Search or ask a question
Book

The Grammar of Graphics

Leland Wilkinson1
01 Jan 1999-
TL;DR: The Grammar of Graphics (GOG) as mentioned in this paper denotes a system with seven orthogonal components, i.e., there are seven graphical component sets whose elements are aspects of the general system and every combination of aspects in the product of all these sets is meaningful.
Abstract: The Grammar of Graphics, or GOG, denotes a system with seven orthogonal components. By orthogonal, we mean there are seven graphical component sets whose elements are aspects of the general system and that every combination of aspects in the product of all these sets is meaningful. This sense of the word orthogonality, a term used by computer designers to describe a combinatoric system of components or building blocks, is in some sense similar to the orthogonal factorial analysis of variance (ANOVA), where factors have levels and all possible combinations of levels exist in the ANOVA design. If we interpret each combination of features in a GOG system as a point in a network, then the world described by GOG is represented in a seven-dimensional rectangular lattice.
Citations
More filters
Journal ArticleDOI
22 Apr 2013-PLOS ONE
TL;DR: The phyloseq project for R is a new open-source software package dedicated to the object-oriented representation and analysis of microbiome census data in R, which supports importing data from a variety of common formats, as well as many analysis techniques.
Abstract: Background The analysis of microbial communities through DNA sequencing brings many challenges: the integration of different types of data with methods from ecology, genetics, phylogenetics, multivariate statistics, visualization and testing. With the increased breadth of experimental designs now being pursued, project-specific statistical analyses are often needed, and these analyses are often difficult (or impossible) for peer researchers to independently reproduce. The vast majority of the requisite tools for performing these analyses reproducibly are already implemented in R and its extensions (packages), but with limited support for high throughput microbiome census data. Results Here we describe a software project, phyloseq, dedicated to the object-oriented representation and analysis of microbiome census data in R. It supports importing data from a variety of common formats, as well as many analysis techniques. These include calibration, filtering, subsetting, agglomeration, multi-table comparisons, diversity analysis, parallelized Fast UniFrac, ordination methods, and production of publication-quality graphics; all in a manner that is easy to document, share, and modify. We show how to apply functions from other R packages to phyloseq-represented data, illustrating the availability of a large number of open source analysis techniques. We discuss the use of phyloseq with tools for reproducible research, a practice common in other fields but still rare in the analysis of highly parallel microbiome census data. We have made available all of the materials necessary to completely reproduce the analysis and figures included in this article, an example of best practices for reproducible research. Conclusions The phyloseq project for R is a new open-source software package, freely available on the web from both GitHub and Bioconductor.

11,272 citations

Journal ArticleDOI
TL;DR: This is a list of winners and nominees for the 2016 Paralympic Games in Rio de Janeiro, Brazil.
Abstract: Hadley Wickham1, Mara Averick1, Jennifer Bryan1, Winston Chang1, Lucy D’Agostino McGowan8, Romain François1, Garrett Grolemund1, Alex Hayes12, Lionel Henry1, Jim Hester1, Max Kuhn1, Thomas Lin Pedersen1, Evan Miller13, Stephan Milton Bache3, Kirill Müller2, Jeroen Ooms14, David Robinson5, Dana Paige Seidel10, Vitalie Spinu4, Kohske Takahashi9, Davis Vaughan1, Claus Wilke6, Kara Woo7, and Hiroaki Yutani11

7,298 citations

Journal ArticleDOI
TL;DR: The toolkit incorporates over 130 functions, which are designed to meet the increasing demand for big-data analyses, ranging from bulk sequence processing to interactive data visualization, and a new plotting engine developed to maximum their interactive ability.

5,173 citations

Journal ArticleDOI
TL;DR: This article details some new methods for the visualization of spatial data in R using the layered grammar of graphics implementation of ggplot2 in conjunction with the contextual information of static maps from Google Maps, OpenStreetMap, Stamen Maps or CloudMade Maps and presents an overview of a few utility functions.
Abstract: In spatial statistics the ability to visualize data and models superimposed with their basic social landmarks and geographic context is invaluable. ggmap is a new tool which enables such visualization by combining the spatial information of static maps from Google Maps, OpenStreetMap, Stamen Maps or CloudMade Maps with the layered grammar of graphics implementation of ggplot2. In addition, several new utility functions are introduced which allow the user to access the Google Geocoding, Distance Matrix, and Directions APIs. The result is an easy, consistent and modular framework for spatial graphics with several convenient tools for spatial data analysis. Introduction Visualizing spatial data in R can be a challenging task. Fortunately the task is made a good deal easier by the data structures and plot methods of sp, RgoogleMaps, and related packages (Pebesma and Bivand, 2006; Bivand et al., 2008; Loecher and Berlin School of Economics and Law, 2013). Using those methods, one can plot the basic geographic information of (for instance) a shape file containing polygons for areal data or points for point referenced data. However, compared to specialized geographic information systems (GISs) such as ESRI’s ArcGIS, which can plot points, polygons, etc. on top of maps and satellite imagery with drag-down menus, these visualizations can be pretty disappointing. This article details some new methods for the visualization of spatial data in R using the layered grammar of graphics implementation of ggplot2 in conjunction with the contextual information of static maps from Google Maps, OpenStreetMap, Stamen Maps or CloudMade Maps (Wickham, 2009, 2010). The result is an easy to use R package named ggmap. After describing the nuts and bolts of ggmap, we showcase some of its capabilities in a simple case study concerning violent crimes in downtown Houston, Texas and present an overview of a few utility functions. Plotting spatial data in R Areal data is data which corresponds to geographical extents with polygonal boundaries. A typical example is the number of residents per zip code. Considering only the boundaries of the areal units, we are used to seeing areal plots in R which resemble those in Figure 1 (left). -96.0 -95.5 -95.0 -94.5 29 .0 29 .5 30 .0 30 .5 longitude la tit ud e -96.0 -95.5 -95.0 -94.5 29 .0 29 .5 30 .0 30 .5 longitude la tit ud e Figure 1: A typical R areal plot – zip codes in the Greater Houston area (left), and a typical R spatial scatterplot – murders in Houston from January 2010 to August 2010 (right). While these kinds of plots are useful, they are not as informative as we would like in many situations. For instance, when plotting zip codes it is helpful to also see major roads and other landmarks which form the boundaries of areal units. The situation for point referenced spatial data is often much worse. Since we can’t easily contextualize a scatterplot of points without any background information at all, it is common to add points as The R Journal Vol. 5/1, June ISSN 2073-4859 CONTRIBUTED RESEARCH ARTICLES 145 an overlay of some areal data—whatever areal data is available. The resulting plot looks like Figure 1 (right). In most cases the plot is understandable to the researcher who has worked on the problem for some time but is of hardly any use to his audience, who must work to associate the data of interest with their location. Moreover, it leaves out many practical details—are most of the events to the east or west of landmark x? Are they clustered around more well-to-do parts of town, or do they tend to occur in disadvantaged areas? Questions like these can’t really be answered using these kinds of graphics because we don’t think in terms of small scale areal boundaries (e.g. zip codes or census tracts). With a little effort better plots can be made, and tools such as maps, maptools, sp, or RgoogleMaps make the process much easier; in fact, RgoogleMaps was the inspiration for ggmap (Becker et al., 2013; Bivand and Lewin-Koh, 2013). Moreover, there has recently been a deluge of interest in the subject of mapmaking in R—Ian Fellows’ excellent interactive GUI-driven DeducerSpatial package based on Bing Maps comes to mind (Fellows et al., 2013). ggmap takes another step in this direction by situating the contextual information of various kinds of static maps in the ggplot2 plotting framework. The result is an easy, consistent way of specifying plots which are readily interpretable by both expert and audience and safeguarded from graphical inconsistencies by the layered grammar of graphics framework. The result is a spatial plot resembling Figure 2. Note that map images and information in this work may appear slightly different due to map provider changes over time. murder

1,541 citations

References
More filters
Journal ArticleDOI

11,905 citations

Book
01 Jan 1925
TL;DR: The prime object of as discussed by the authors is to put into the hands of research workers, and especially of biologists, the means of applying statistical tests accurately to numerical data accumulated in their own laboratories or available in the literature.
Abstract: The prime object of this book is to put into the hands of research workers, and especially of biologists, the means of applying statistical tests accurately to numerical data accumulated in their own laboratories or available in the literature.

11,308 citations

Journal ArticleDOI

8,052 citations

Journal ArticleDOI
S. S. Stevens1
07 Jun 1946-Science
TL;DR: The current issues will remain at 32 pages until a more adequate supply of paper is assured, due to a shortage of paper for Bacto-Agar research.
Abstract: The current issues will remain at 32 pages until we are assured of a more adequate supply ofpaper. Bacto-Agar is a purified Agar prepared from domestic material. In the manufacture of Bacto-Agar extraneous matter, pigmented portions, and salts are reduced to a minimum, so that the finished product in the form of fine granules will dissolve rapidly, giving clear solutions. Bacto-Asparagine Bacto-Asparagine is a purified amino acid widely used in synthetic culture media and in the preparation of tuberculin.

4,080 citations