scispace - formally typeset
Search or ask a question
Book

ggplot2: Elegant Graphics for Data Analysis

13 Aug 2009-
TL;DR: This book describes ggplot2, a new data visualization package for R that uses the insights from Leland Wilkisons Grammar of Graphics to create a powerful and flexible system for creating data graphics.
Abstract: This book describes ggplot2, a new data visualization package for R that uses the insights from Leland Wilkisons Grammar of Graphics to create a powerful and flexible system for creating data graphics. With ggplot2, its easy to: produce handsome, publication-quality plots, with automatic legends created from the plot specification superpose multiple layers (points, lines, maps, tiles, box plots to name a few) from different data sources, with automatically adjusted common scales add customisable smoothers that use the powerful modelling capabilities of R, such as loess, linear models, generalised additive models and robust regression save any ggplot2 plot (or part thereof) for later modification or reuse create custom themes that capture in-house or journal style requirements, and that can easily be applied to multiple plots approach your graph from a visual perspective, thinking about how each component of the data is represented on the final plot. This book will be useful to everyone who has struggled with displaying their data in an informative and attractive way. You will need some basic knowledge of R (i.e. you should be able to get your data into R), but ggplot2 is a mini-language specifically tailored for producing graphics, and youll learn everything you need in the book. After reading this book youll be able to produce graphics customized precisely for your problems,and youll find it easy to get graphics out of your head and on to the screen or page.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: The lmerTest package extends the 'lmerMod' class of the lme4 package, by overloading the anova and summary functions by providing p values for tests for fixed effects, and implementing the Satterthwaite's method for approximating degrees of freedom for the t and F tests.
Abstract: One of the frequent questions by users of the mixed model function lmer of the lme4 package has been: How can I get p values for the F and t tests for objects returned by lmer? The lmerTest package extends the 'lmerMod' class of the lme4 package, by overloading the anova and summary functions by providing p values for tests for fixed effects. We have implemented the Satterthwaite's method for approximating degrees of freedom for the t and F tests. We have also implemented the construction of Type I - III ANOVA tables. Furthermore, one may also obtain the summary as well as the anova table using the Kenward-Roger approximation for denominator degrees of freedom (based on the KRmodcomp function from the pbkrtest package). Some other convenient mixed model analysis tools such as a step method, that performs backward elimination of nonsignificant effects - both random and fixed, calculation of population means and multiple comparison tests together with plot facilities are provided by the package as well.

12,305 citations


Cites methods from "ggplot2: Elegant Graphics for Data ..."

  • ...The ggplot2 package (Wickham 2009) is used in the lmerTest package for generating the barplots for the least square means and differences of least square means....

    [...]

Journal ArticleDOI
22 Apr 2013-PLOS ONE
TL;DR: The phyloseq project for R is a new open-source software package dedicated to the object-oriented representation and analysis of microbiome census data in R, which supports importing data from a variety of common formats, as well as many analysis techniques.
Abstract: Background The analysis of microbial communities through DNA sequencing brings many challenges: the integration of different types of data with methods from ecology, genetics, phylogenetics, multivariate statistics, visualization and testing. With the increased breadth of experimental designs now being pursued, project-specific statistical analyses are often needed, and these analyses are often difficult (or impossible) for peer researchers to independently reproduce. The vast majority of the requisite tools for performing these analyses reproducibly are already implemented in R and its extensions (packages), but with limited support for high throughput microbiome census data. Results Here we describe a software project, phyloseq, dedicated to the object-oriented representation and analysis of microbiome census data in R. It supports importing data from a variety of common formats, as well as many analysis techniques. These include calibration, filtering, subsetting, agglomeration, multi-table comparisons, diversity analysis, parallelized Fast UniFrac, ordination methods, and production of publication-quality graphics; all in a manner that is easy to document, share, and modify. We show how to apply functions from other R packages to phyloseq-represented data, illustrating the availability of a large number of open source analysis techniques. We discuss the use of phyloseq with tools for reproducible research, a practice common in other fields but still rare in the analysis of highly parallel microbiome census data. We have made available all of the materials necessary to completely reproduce the analysis and figures included in this article, an example of best practices for reproducible research. Conclusions The phyloseq project for R is a new open-source software package, freely available on the web from both GitHub and Bioconductor.

11,272 citations

Journal ArticleDOI
TL;DR: This protocol begins with raw sequencing reads and produces a transcriptome assembly, lists of differentially expressed and regulated genes and transcripts, and publication-quality visualizations of analysis results, which takes less than 1 d of computer time for typical experiments and ∼1 h of hands-on time.
Abstract: Recent advances in high-throughput cDNA sequencing (RNA-seq) can reveal new genes and splice variants and quantify expression genome-wide in a single assay. The volume and complexity of data from RNA-seq experiments necessitate scalable, fast and mathematically principled analysis software. TopHat and Cufflinks are free, open-source software tools for gene discovery and comprehensive expression analysis of high-throughput mRNA sequencing (RNA-seq) data. Together, they allow biologists to identify new genes and new splice variants of known ones, as well as compare gene and transcript expression under two or more conditions. This protocol describes in detail how to use TopHat and Cufflinks to perform such analyses. It also covers several accessory tools and utilities that aid in managing data, including CummeRbund, a tool for visualizing RNA-seq analysis results. Although the procedure assumes basic informatics skills, these tools assume little to no background with RNA-seq analysis and are meant for novices and experts alike. The protocol begins with raw sequencing reads and produces a transcriptome assembly, lists of differentially expressed and regulated genes and transcripts, and publication-quality visualizations of analysis results. The protocol's execution time depends on the volume of transcriptome sequencing data and available computing resources but takes less than 1 d of computer time for typical experiments and ∼1 h of hands-on time.

10,913 citations

Journal ArticleDOI
TL;DR: The Central Brain Tumor Registry of the United States (CBTRUS), in collaboration with the Centers for Disease Control and Prevention and National Cancer Institute, is the largest population-based registry focused exclusively on primary brain and other central nervous system (CNS) tumors in the US.
Abstract: The Central Brain Tumor Registry of the United States (CBTRUS), in collaboration with the Centers for Disease Control (CDC) and National Cancer Institute (NCI), is the largest population-based registry focused exclusively on primary brain and other central nervous system (CNS) tumors in the United States (US) and represents the entire US population. This report contains the most up-to-date population-based data on primary brain tumors (malignant and non-malignant) and supersedes all previous CBTRUS reports in terms of completeness and accuracy. All rates (incidence and mortality) are age-adjusted using the 2000 US standard population and presented per 100,000 population. The average annual age-adjusted incidence rate (AAAIR) of all malignant and non-malignant brain and other CNS tumors was 23.79 (Malignant AAAIR=7.08, non-Malignant AAAIR=16.71). This rate was higher in females compared to males (26.31 versus 21.09), Blacks compared to Whites (23.88 versus 23.83), and non-Hispanics compared to Hispanics (24.23 versus 21.48). The most commonly occurring malignant brain and other CNS tumor was glioblastoma (14.5% of all tumors), and the most common non-malignant tumor was meningioma (38.3% of all tumors). Glioblastoma was more common in males, and meningioma was more common in females. In children and adolescents (age 0-19 years), the incidence rate of all primary brain and other CNS tumors was 6.14. An estimated 83,830 new cases of malignant and non-malignant brain and other CNS tumors are expected to be diagnosed in the US in 2020 (24,970 malignant and 58,860 non-malignant). There were 81,246 deaths attributed to malignant brain and other CNS tumors between 2013 and 2017. This represents an average annual mortality rate of 4.42. The 5-year relative survival rate following diagnosis of a malignant brain and other CNS tumor was 23.5% and for a non-malignant brain and other CNS tumor was 82.4%.

9,802 citations

Journal ArticleDOI
TL;DR: This is a list of winners and nominees for the 2016 Paralympic Games in Rio de Janeiro, Brazil.
Abstract: Hadley Wickham1, Mara Averick1, Jennifer Bryan1, Winston Chang1, Lucy D’Agostino McGowan8, Romain François1, Garrett Grolemund1, Alex Hayes12, Lionel Henry1, Jim Hester1, Max Kuhn1, Thomas Lin Pedersen1, Evan Miller13, Stephan Milton Bache3, Kirill Müller2, Jeroen Ooms14, David Robinson5, Dana Paige Seidel10, Vitalie Spinu4, Kohske Takahashi9, Davis Vaughan1, Claus Wilke6, Kara Woo7, and Hiroaki Yutani11

7,298 citations


Cites methods from "ggplot2: Elegant Graphics for Data ..."

  • ..., 2019a), forcats (Wickham, 2019a), ggplot2 (Wickham, 2016), purrr (Henry & Wickham, 2019), readr (Wickham & Hester, 2018), stringr (Wickham, 2019b), tibble (Müller & Wickham, 2018), and tidyr (Wickham & Henry, 2019)....

    [...]

  • ...The tidyverse provides the ggplot2 (Wickham, 2016) package for visualisation....

    [...]