The ggplot2 package is a package that allows you to rapidly make impressive plots from complex data sets.
In addition to this tutorial there are several excellent resources available on the web:
ggplot is part of the tidyverse, so to use these functions in your own R session you would need to load that library. I’ve pre-loaded it for this tutorial. Otherwise, you would need to enter
library(tidyverse)
(You can also just use library(ggplot2)
if you don’t want/need the other tidyverse packages)
We again work with the Tomato data set. It has been preloaded for this tutorial.
head(tomato)
To remind you, it has the following columns:
Let us begun by asking if there is a relationship between the length of the petiole (leaf stem) and the leaf of the leaf blade:
ggplot(data=tomato,
mapping = aes(x=petleng,y=leafleng)) +
geom_point()
Yes, it looks like there is.
Let’s look at each line of the code above:
ggplot(data=tomato,
ggplot is the main function and initiates the plot. The data argument tells ggplot what data set to work on.mapping = aes(x=petleng,y=leafleng) +
Here we are telling ggplot which columns in our data set should be mapped to particular plot AESthetics. The argument is called mapping
and the input to mapping is the aes()
function. We use aes()
to tell ggplot that petiole length should be mapped to the x axis and leaf length should be mapped to the y axis.mapping
ends with a +
. This tells R that we want to add to the plot created by the ggplot function.geom_point()
. Geoms (geometries) indicate what type of plot we want to make.We will look at these functions in more detail as the tutorial continues…
Note that since we loaded tidyverse we could also pipe our data into ggplot:
tomato %>% ggplot(mapping = aes(x=petleng,y=leafleng)) +
geom_point()
In this section we will explore aesthetics. As noted above, aesthetics control the relationship between your data and plot elements.
One common aesthetic is the color
aesthetic.
Change the code below so that color is mapped to the treatment (trt
) column, therby re-creating the plot shown above:
tomato %>% ggplot(mapping = aes(x=petleng,y=leafleng)) +
geom_point()
The shape
aesthetic controls the shape of the plotted points.
25 shapes are available:
Note: color of the fill for shapes 21-24 can be controlled with the fill
aesthetics. the color of the rest of the shapes, as well as the border of shapes 21-24 is controlled with the color
aesthetic
Create a plot of int3 vs int4 where color indicates trt, and shape indicates who measured the plant.
Your plot should look like this:
The size
aesthetic controls the size of the plotted points.
To practice, create a plot of latitude vs longitude where altitude is indicated by the size of the point and species is indicated by color
What if you want to change a plot characteristic but not have it mapped to a data column? You can do this by setting the characteristic in the geom call, but outside of the aes
function:
tomato %>% ggplot(mapping = aes(x=petleng,y=leafleng)) +
geom_point(color="skyblue")
tomato %>% ggplot(mapping = aes(x=petleng,y=leafleng)) +
geom_point(color="skyblue")
There are many more aesthetics available, depending on the geom used. Some of these will be introduced as you learn about additional geoms.
Geoms control the type of plot that is made. You have already seen one geom, geom_point
.
geom_smooth
allows you to add trend lines to your plots, for example:
tomato %>% ggplot(aes(x=lon, y = lat)) +
geom_smooth()
But wait, what if you also want the original data points? We can add multiple geoms to a plot:
tomato %>% ggplot(aes(x=lon, y = lat)) +
geom_smooth() +
geom_point()
By default geom_smooth fits a smoothed line to the data. But you can also show a best-fit, straight linear regression. To do this we tell geom_smooth to use the “lm” (linear model) function:
tomato %>% ggplot(aes(x=lon, y = lat)) +
geom_smooth(method="lm") +
geom_point()
Make a scatter plot of int3 vs int4 as you have before, but this time add a trendline.
geom_histogram()
creates histograms. For histograms, values for the y-axis are calculated for you, so we just provide a x aesthetic:
tomato %>% ggplot(aes(x=hyp)) +
geom_histogram()
Histograms (and many other plots) can use the fill
aesthetic to control the color used to fill the bars (or other shapes).
tomato %>% ggplot(aes(x=hyp)) +
geom_histogram(fill="red")
geom_density()
. Make a density plot below:
How would you describe the difference between a density plot and a histogram?
One nice thing about density plots is that we can compare the densities of different subsets of the data:
tomato %>% ggplot(aes(x=hyp, fill=trt)) +
geom_density(alpha=.5)
What is alpha doing? Experiment with different values; the allowable range is 0 to 1.
tomato %>% ggplot(aes(x=hyp, fill=trt)) +
geom_density(alpha=.5)
Alpha can be used in most geoms.
Boxplots and violin plots provide quick summaries of different classes of data. Suppose we want to examine hypocotyl length of each species. We can map hypocotyl length to the y-axis and species to the x-axis.
tomato %>% ggplot(aes(x=species, y=hyp)) +
geom_boxplot()
In a boxplot the horizontal line represents the median. Look at the help for geom_boxplot to determine what other components represent:
A related geom is geom_violin()
Remake the above plot using geom_violin.
Test your skills
Make a boxplot showing hypocotyl length for the “H” and “L” treatmentsIf we add a color or fill aesthetic to a box or violin plot then we can start comparing multiple factors in our data.
Use the coding box below to re-create this plot:
Look at the plots to figure out the aesthetics: what is mapped to x, y, and fill? Once you know that you should be able to code it up!
What does this plot illustrate?
geom_col()
allows you to make a classic bar chart, where the height of the bars corresponds to some value in the data. This works best for data summaries.
First let’s summarize our data:
sem <- function(x, na.rm=FALSE) {
sd(x,na.rm=na.rm)/sqrt(length(na.omit(x)))
}
int3.mean.sem <- tomato %>%
group_by(species, trt) %>%
summarize(mean=mean(int3, na.rm=TRUE), sem=sem(int3, na.rm=TRUE))
int3.mean.sem
int3.mean.sem %>% ggplot(aes(x=species, y = mean, fill=trt)) +
geom_col()
by default geom_col stacks the columns…perhaps not what we want. We can change that with position
int3.mean.sem %>% ggplot(aes(x=species, y = mean, fill=trt)) +
geom_col(position="dodge")
what if we want to add error bars? we use geom_errorbar
and the ymin and max aesthetics
int3.mean.sem %>% ggplot(aes(x=species,
y = mean,
fill=trt,
ymax=mean+sem,
ymin=mean-sem)) +
geom_col(position="dodge") +
geom_errorbar(position = position_dodge(width=0.9), width=.5)
Your turn…
Make a bar chart that shows average leaf length for each accession (acs) and trt combination.
data appropriate for bar charts also can be plotted using lines:
int3.mean.sem %>% ggplot(aes(x=species,
y=mean,
color=trt,
group=trt,
shape=trt,
ymax=mean+sem,
ymin=mean-sem)) +
geom_line() +
geom_errorbar(width=.1) +
geom_point()
It actually doesn’t make a lot of sense to plot this data that way. However, plotting each species’ reaction to the treatment would. Modify the above code to make this plot:
There are several more geoms available. You can check the docs to see a listing.
ggplot does a nice job of automatically defining the scales, but what if you want something different? we add a call to a scale()
function.
Consider this bar chart again:
What if we want the fill colors to be something else? We use scale_fill_manual()
int3.mean.sem %>% ggplot(aes(x=species, y = mean, fill=trt, ymax=mean+sem, ymin=mean-sem)) +
geom_col(position="dodge") +
geom_errorbar(position = position_dodge(width=0.9), width=.5) +
scale_fill_manual(values = c("H"="darkblue","L"="red"))
Note: you can get a list of possible colors with colors()
Change the colors in the plot below so that Dan and Pepe are different from the defaults. You can choose colors of your liking:
tomato %>% ggplot(aes(x=species, y = hyp, fill=who)) +
geom_violin()
Most aesthetics have similar scale commands that allow you to adjust how they are used. See scales
A particular useful one is scale_y_log10()
that transforms the y-axis scale (there is an equivalent scale_x_log10()
You have seen that one way to split your data by categories is to map a categorical variable to an aesthetic. e.g. the code below separates the data into “H” and “L” treatments before making the density plot.
tomato %>% ggplot(aes(x=int3, fill=trt)) +
geom_density(alpha=.5)
A second way to do this is to facet your data using facet_wrap()
or facet_grid()
.
facet_wrap()
uses a single variable for faceting and you can specify the number of rows or columns used in the layout.
tomato %>% ggplot(aes(x=int3)) +
geom_density(fill="lightblue") +
facet_wrap(~ trt)
Modify the code below so that the facets are arranged in columns instead of rows (hint, look at the help page for facet_wrap)
tomato %>% ggplot(aes(x=hyp)) +
geom_density(fill="papayawhip") +
facet_wrap(~ trt)
facet_grid()
can use two variables to facet and uses those variable to specify the grid of rows and columns:
tomato %>% ggplot(aes(x=int3)) +
geom_histogram(fill="lawngreen") +
facet_grid(who ~ trt)
Practice by recreating the plot shown below:
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
The plot titles and labels are easily changed. Starting with this plot:
we can use ggtitle
to add a main title to the plot, and xlab()
and ylab()
to change the axis labels.
tomato %>% ggplot(aes(x=species,y=int3,fill=trt)) +
geom_boxplot() +
ylab("Internode 3 (mm)")
tomato %>% ggplot(aes(x=species,y=int3,fill=trt)) +
geom_boxplot() +
ylab("Internode 3 (mm)")
There are many more manipulations to labels, as detailed in “Titles”, “Axes” and “Legends” sections of the Cookbook for R.
If you want to save your plot to an external file you can use ggsave()
. This will save the most recent plot to the path that you specify. R will figure out the appropriate file type from the file extension (pdf, png, jpg, tif). You can also specify height and width.
tomato %>% ggplot(aes(x=species,y=int3,fill=trt)) +
geom_boxplot() +
ylab("Internode 3 (mm)")
ggsave("~/Desktop/Internode3.pdf", height=6, width = 6)
This is the end of the tutorial.
As noted at the beginning, there are several sources for additional information, including: