A scatterplot shows relationship between two different numeric variables in x-y coordinate plot.

Script to illustrate scatter plots in R and contains basic instructions on how to customise figures and render publishable graphs.

Data: iris (in-built dataset in R base)

1 Housekeeping

Remember that rm(list=ls()) is not sufficient for a full clean set up of R, and you should use Ctrl/Cmd-SHIFT-F10 in Rstudio to Restart R cleanly and check that this works at least at the end of your analysis, or periodically during development.

# remove (almost) all objects currently held in the R environment
rm(list=ls()) 

In this example we will use the in-built dataset “iris” which gives the measurements in centimeters of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris. The species are Iris setosa, versicolor, and virginica.

See ?iris for more information.

2 Help on file

head(iris)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa
# help(iris) or ?iris will give you mor information about the data set
?plot # info about plot function

3 Plot and explore your data

We will focus on the Petal Length and Width data for a start and will focus on a basic plot.

3.1 First a basic plot

You have to be very carefull to put right variable on the x-axis and on the y-axis. There are two ways to specify which axis is which using the plot() function. In the programmatic version plot(x, y), the first variable after brackets is x and the second one is the y- variable. In the formula version plot(y ~ x), y is plotted against x, which means y is treated as the rsponse variable and plotted on the y axis, and x is on the x-axis.

The formula version plot(y ~ x) offers several advantages and we strongly suggest you use the formula version of all functions where possible (sadly not all functions take the formula method). One big advantage is that the formula method lets you specify the data.frame object to use explicitly meaning you do not need to use the $ notatin. The other big advantage is that it mirrors the calls to the modelling functions for fitting lines to data which we will come to in subsequent lectures.

First the plot(x,y) method.

plot( iris$Petal.Width, iris$Petal.Length )

And our preferred formula method plot(y ~ x).

plot( Petal.Length ~ Petal.Width, data = iris )

For each flower there is one dot as one observation of Petal.Width associated with one observation of Petal.Length.

You can see some kind of alometric scaling, as Petal.Width increases, Petal.Length increases as well.

3.2 Plot for publication

Now tidy up the labels and make the fonts bigger and add a line of best fit through the points see podcast on linear regression for more details.

plot( Petal.Length ~ Petal.Width, data = iris,
        xlab="Petal Width (cm)", ylab="Petal Length (cm)", pch=20 ,
        cex.lab=1.5, cex.axis=1.5, cex=1.2,
        bty="L", las=1, tcl=0.5 ) # xlab - specifies the name of the x axis label and ylab specifies the name of the y axis label; pch specifies type of point we use - pch=20 is small round dot; cex - character expansion (labels, axis and points), bty - box type. las numeric in {0,1,2,3}; the style of axis labels. 0: always parallel to the axis [default], 1: always horizontal, 2: always perpendicular to the axis, 3: always vertical. tlc - The length of tick marks as a fraction of the height of a line of text. The default value is -0.5; setting tcl = NA sets tck = -0.01 which is S' default.
abline(lm(Petal.Length~Petal.Width, data=iris),col="black", lwd=2, lty=1)

4 Help on graph settings

?par
?points

5 Graph with different species in colour

This graph include information on the different species and now tidy up the labels and make the fonts bigger. We could draw a line for mean in each species. The grouping variable (Species) must be categorical (factor or character vector).

# define a custom sequence of colours to use
my.colors <- c("black","blue","green")

# define a custom sequence of point types to use
my.points <- c(16,17,18)

# display levels of variable Species within iris data set
levels(iris$Species) 
## [1] "setosa"     "versicolor" "virginica"
# generate the plot with colours and points by species
plot( Petal.Length ~ Petal.Width, data = iris, 
      col=my.colors[iris$Species], 
      pch=my.points[iris$Species], 
      xlab="Petal Width (cm)", ylab="Petal Length (cm)",
      cex.lab=1.5, cex.axis=1.5, cex=1.2,
      bty="L", las=1, tcl=0.5 )

# a legend to the plot
legend("topleft", levels(iris$Species), col=my.colors, pch=my.points, lty=0, bty="n",
        cex=1.5) # bty = "n" means no box around legend

# add horizontal lines for each of the species' petal lengths
abline(h=mean(iris$Petal.Length[iris$Species=="virginica"]), col="green", lty=2)
abline(h=mean(iris$Petal.Length[iris$Species=="versicolor"]), col="blue", lty=2)
abline(h=mean(iris$Petal.Length[iris$Species=="setosa"]), col="black", lty=2)
# palette()
# colors()

# add some text to the figure in glorious peachpuff4
text(1.5,2,labels="this graph pwns", col="peachpuff4",cex=2 ) 

# some other embellishments you might want to add...
# add some specific lines or points
points(2.5,2,pch=10,col="red",cex=2) 
lines(c(2.2,2.5),c(2,3),col="magenta",lwd=2)

6 Save your data (only if you want)

The “list=” command tells us which variables we want to save. The “file=” option tells us what file to save the data to.

save( list=ls(), file="grazing_data.rdata")

Once you are happy with your graph, you can export it as a high resolution tiff for example. This is done by opening a black *.tif file in which to print the figure, creating the figure, and then closing the file which causes it to finalise writing it to file.

Check you working directory before saving your plot, while plot will be saved on that location.

# create and open the tif file for writing
tiff(filename="petal_plot_1.tif",units="cm",
        height=21,width=21, res=300, compression="none")

# do the plotting
my.colors <- c("black","blue","green")
my.points <- c(16,17,18)
plot(Petal.Length ~ Petal.Width, data = iris,
        col = my.colors[iris$Species], 
        pch = my.points[iris$Species], 
        xlab = "Petal Width (cm)", ylab="Petal Length (cm)",
        cex.lab = 1.5, cex.axis = 1.5, cex = 1.2,
        bty = "L", las = 1, tcl = 0.5 )
legend("topleft", levels(iris$Species), col=my.colors, pch=my.points, lty=0, bty="n",
        cex=1.5) # bty = "n" means no box around legend

# close the tif file
dev.off()
## quartz_off_screen 
##                 2

7 Nicer (and sort of easier) graphics using ggplot

If you want the best and most sofisticated scatter plot with lots of options you should use the package ggplot2. It is more complex than base graphics in R, and it certainly follows a very different syntax, and requires installation of the tidyverse package

Core elements:

Data
Aesthetics - aes()
Geoms - geom_()
Themes - theme()
Guides - guide()

Data

The information you want to communicate. This information is generally stored in a data frame, matrix or table.

Aesthetics

The aesthetic attributes you use to represent your data. These translate the information stored in columns in your data frame into visual properties of the plot such as point location, bar height, colour, size, shape etc..

Geoms

These are the geometric objects on the plot, e.g. lines, bars and points.

Themes

These adjust the overall appearance of the plot, e.g. background colour.

Guides

These objects help the viewer to interpret the plot, e.g. axis labels and legends.

You will need to install the tidyverse set of packages if you do not already have them: install.packages("tidyverse")

With ggplot2, you can create the figure without printing it anywhere by storing the instructions behind the figure as an object. You can then use print() to show it on screen or embed it into a Rmarkdown document. In contast to the method above where you have to open and close a plotting device (we used a *.tif file), you can use ggsave() to acheive the same. The advantage here is you can both print it and save it to file easily enough. Wtih ggsave, there are lots of file formats you can save to, including png, jpg, etc… We will not cover ggplot2 in detail here, but you can find a lot of help online.

library(tidyverse) # load package with ggplot
## Loading tidyverse: ggplot2
## Loading tidyverse: tibble
## Loading tidyverse: tidyr
## Loading tidyverse: readr
## Loading tidyverse: purrr
## Loading tidyverse: dplyr
## Conflicts with tidy packages ----------------------------------------------
## filter(): dplyr, stats
## lag():    dplyr, stats
# create the plot and save it in object called p2
p2 <- ggplot(data = iris, 
             mapping = aes(x = Petal.Width, y = Petal.Length)) + 
  geom_point(mapping = aes(color = Species, shape=Species)) + 
  geom_smooth() + 
  ggtitle("Petal width and length plot") + 
  theme(plot.title = element_text(hjust = 0.5), 
        axis.text=element_text(size=12),
        axis.title=element_text(size=14,face="bold")) + 
  labs( x="Petal width (cm)",y="Petal Length(cm)") 

# printing the created ggplot object causes it to be show on screen
print(p2)
## `geom_smooth()` using method = 'loess'

# save it out as a tiff or myriad otehr formats
ggsave(filename="petal_plot_2.tif", plot = p2, device = "tiff",
       dpi = 300, compression = "none")
## Saving 7 x 5 in image
## `geom_smooth()` using method = 'loess'
# aesthetics defines x and y axis; geom_point defines classification variable 
# Species for shape and colour; geom_smooth provides shoothing curve through 
# data; 
# ggtitle defines title; 
# theme adjusts title in the centre of the plot; 
# axis.text - defines text size for axis text; 
# axis.title - defines size and style of the axis text; 
# labs defines x and y - axis labels.
library(tidyverse)

# create the plot and save it in object called p3
p3 <- ggplot(data= iris, mapping = aes(x=Petal.Width,y=Petal.Length)) + 
  geom_point(mapping = aes(color = Species, shape=Species)) + 
  geom_smooth() + 
  ggtitle("Petal width and length plot") + 
  scale_shape_manual (values=4:6) + 
  theme(plot.title = element_text(hjust = 0.5), 
        axis.text=element_text(size=12),
        axis.title=element_text(size=14,face="bold")) + 
  labs( x="Petal width (cm)",y="Petal Length(cm)")
# the only difference with previous graph is that each shape variable 
# (Species) is defined manualy with scale_shape_manual. 

# save it out as a tiff or myriad otehr formats
ggsave(filename="petal_plot_3.tif", plot = p3, device = "tiff",
       dpi = 300, compression = "none")
## Saving 7 x 5 in image
## `geom_smooth()` using method = 'loess'