1 Housekeeping
2 Histogram
3 Finding and explore your base data
4 Plot a frequency histogram of your data and inspect histogram data
5 Plot a density histogram of your data
6 Save your histogram

1 Housekeeping

Remember that rm(list=ls()) is not sufficient for a full clean set up of R, and you should use Ctrl/Cmd-SHIFT-F10 in Rstudio to Restart R cleanly and check that this works at least at the end of your analysis, or periodically during development.

# remove (almost) all objects currently held in the R environment
rm(list=ls())

2 Histogram

Histrograms are used to present distribution of continous variable. The accuracy of a histogram depend on the widths of the intervals to bin the data.

3 Finding and explore your base data

data() # display all built in datasets
?iris # help for data set iris
iris # display iris data set

##     Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
## 1            5.1         3.5          1.4         0.2     setosa
## 2            4.9         3.0          1.4         0.2     setosa
## 3            4.7         3.2          1.3         0.2     setosa
## 4            4.6         3.1          1.5         0.2     setosa
## 5            5.0         3.6          1.4         0.2     setosa
## 6            5.4         3.9          1.7         0.4     setosa
## 7            4.6         3.4          1.4         0.3     setosa
## 8            5.0         3.4          1.5         0.2     setosa
## 9            4.4         2.9          1.4         0.2     setosa
## 10           4.9         3.1          1.5         0.1     setosa
## 11           5.4         3.7          1.5         0.2     setosa
## 12           4.8         3.4          1.6         0.2     setosa
## 13           4.8         3.0          1.4         0.1     setosa
## 14           4.3         3.0          1.1         0.1     setosa
## 15           5.8         4.0          1.2         0.2     setosa
## 16           5.7         4.4          1.5         0.4     setosa
## 17           5.4         3.9          1.3         0.4     setosa
## 18           5.1         3.5          1.4         0.3     setosa
## 19           5.7         3.8          1.7         0.3     setosa
## 20           5.1         3.8          1.5         0.3     setosa
## 21           5.4         3.4          1.7         0.2     setosa
## 22           5.1         3.7          1.5         0.4     setosa
## 23           4.6         3.6          1.0         0.2     setosa
## 24           5.1         3.3          1.7         0.5     setosa
## 25           4.8         3.4          1.9         0.2     setosa
## 26           5.0         3.0          1.6         0.2     setosa
## 27           5.0         3.4          1.6         0.4     setosa
## 28           5.2         3.5          1.5         0.2     setosa
## 29           5.2         3.4          1.4         0.2     setosa
## 30           4.7         3.2          1.6         0.2     setosa
## 31           4.8         3.1          1.6         0.2     setosa
## 32           5.4         3.4          1.5         0.4     setosa
## 33           5.2         4.1          1.5         0.1     setosa
## 34           5.5         4.2          1.4         0.2     setosa
## 35           4.9         3.1          1.5         0.2     setosa
## 36           5.0         3.2          1.2         0.2     setosa
## 37           5.5         3.5          1.3         0.2     setosa
## 38           4.9         3.6          1.4         0.1     setosa
## 39           4.4         3.0          1.3         0.2     setosa
## 40           5.1         3.4          1.5         0.2     setosa
## 41           5.0         3.5          1.3         0.3     setosa
## 42           4.5         2.3          1.3         0.3     setosa
## 43           4.4         3.2          1.3         0.2     setosa
## 44           5.0         3.5          1.6         0.6     setosa
## 45           5.1         3.8          1.9         0.4     setosa
## 46           4.8         3.0          1.4         0.3     setosa
## 47           5.1         3.8          1.6         0.2     setosa
## 48           4.6         3.2          1.4         0.2     setosa
## 49           5.3         3.7          1.5         0.2     setosa
## 50           5.0         3.3          1.4         0.2     setosa
## 51           7.0         3.2          4.7         1.4 versicolor
## 52           6.4         3.2          4.5         1.5 versicolor
## 53           6.9         3.1          4.9         1.5 versicolor
## 54           5.5         2.3          4.0         1.3 versicolor
## 55           6.5         2.8          4.6         1.5 versicolor
## 56           5.7         2.8          4.5         1.3 versicolor
## 57           6.3         3.3          4.7         1.6 versicolor
## 58           4.9         2.4          3.3         1.0 versicolor
## 59           6.6         2.9          4.6         1.3 versicolor
## 60           5.2         2.7          3.9         1.4 versicolor
## 61           5.0         2.0          3.5         1.0 versicolor
## 62           5.9         3.0          4.2         1.5 versicolor
## 63           6.0         2.2          4.0         1.0 versicolor
## 64           6.1         2.9          4.7         1.4 versicolor
## 65           5.6         2.9          3.6         1.3 versicolor
## 66           6.7         3.1          4.4         1.4 versicolor
## 67           5.6         3.0          4.5         1.5 versicolor
## 68           5.8         2.7          4.1         1.0 versicolor
## 69           6.2         2.2          4.5         1.5 versicolor
## 70           5.6         2.5          3.9         1.1 versicolor
## 71           5.9         3.2          4.8         1.8 versicolor
## 72           6.1         2.8          4.0         1.3 versicolor
## 73           6.3         2.5          4.9         1.5 versicolor
## 74           6.1         2.8          4.7         1.2 versicolor
## 75           6.4         2.9          4.3         1.3 versicolor
## 76           6.6         3.0          4.4         1.4 versicolor
## 77           6.8         2.8          4.8         1.4 versicolor
## 78           6.7         3.0          5.0         1.7 versicolor
## 79           6.0         2.9          4.5         1.5 versicolor
## 80           5.7         2.6          3.5         1.0 versicolor
## 81           5.5         2.4          3.8         1.1 versicolor
## 82           5.5         2.4          3.7         1.0 versicolor
## 83           5.8         2.7          3.9         1.2 versicolor
## 84           6.0         2.7          5.1         1.6 versicolor
## 85           5.4         3.0          4.5         1.5 versicolor
## 86           6.0         3.4          4.5         1.6 versicolor
## 87           6.7         3.1          4.7         1.5 versicolor
## 88           6.3         2.3          4.4         1.3 versicolor
## 89           5.6         3.0          4.1         1.3 versicolor
## 90           5.5         2.5          4.0         1.3 versicolor
## 91           5.5         2.6          4.4         1.2 versicolor
## 92           6.1         3.0          4.6         1.4 versicolor
## 93           5.8         2.6          4.0         1.2 versicolor
## 94           5.0         2.3          3.3         1.0 versicolor
## 95           5.6         2.7          4.2         1.3 versicolor
## 96           5.7         3.0          4.2         1.2 versicolor
## 97           5.7         2.9          4.2         1.3 versicolor
## 98           6.2         2.9          4.3         1.3 versicolor
## 99           5.1         2.5          3.0         1.1 versicolor
## 100          5.7         2.8          4.1         1.3 versicolor
## 101          6.3         3.3          6.0         2.5  virginica
## 102          5.8         2.7          5.1         1.9  virginica
## 103          7.1         3.0          5.9         2.1  virginica
## 104          6.3         2.9          5.6         1.8  virginica
## 105          6.5         3.0          5.8         2.2  virginica
## 106          7.6         3.0          6.6         2.1  virginica
## 107          4.9         2.5          4.5         1.7  virginica
## 108          7.3         2.9          6.3         1.8  virginica
## 109          6.7         2.5          5.8         1.8  virginica
## 110          7.2         3.6          6.1         2.5  virginica
## 111          6.5         3.2          5.1         2.0  virginica
## 112          6.4         2.7          5.3         1.9  virginica
## 113          6.8         3.0          5.5         2.1  virginica
## 114          5.7         2.5          5.0         2.0  virginica
## 115          5.8         2.8          5.1         2.4  virginica
## 116          6.4         3.2          5.3         2.3  virginica
## 117          6.5         3.0          5.5         1.8  virginica
## 118          7.7         3.8          6.7         2.2  virginica
## 119          7.7         2.6          6.9         2.3  virginica
## 120          6.0         2.2          5.0         1.5  virginica
## 121          6.9         3.2          5.7         2.3  virginica
## 122          5.6         2.8          4.9         2.0  virginica
## 123          7.7         2.8          6.7         2.0  virginica
## 124          6.3         2.7          4.9         1.8  virginica
## 125          6.7         3.3          5.7         2.1  virginica
## 126          7.2         3.2          6.0         1.8  virginica
## 127          6.2         2.8          4.8         1.8  virginica
## 128          6.1         3.0          4.9         1.8  virginica
## 129          6.4         2.8          5.6         2.1  virginica
## 130          7.2         3.0          5.8         1.6  virginica
## 131          7.4         2.8          6.1         1.9  virginica
## 132          7.9         3.8          6.4         2.0  virginica
## 133          6.4         2.8          5.6         2.2  virginica
## 134          6.3         2.8          5.1         1.5  virginica
## 135          6.1         2.6          5.6         1.4  virginica
## 136          7.7         3.0          6.1         2.3  virginica
## 137          6.3         3.4          5.6         2.4  virginica
## 138          6.4         3.1          5.5         1.8  virginica
## 139          6.0         3.0          4.8         1.8  virginica
## 140          6.9         3.1          5.4         2.1  virginica
## 141          6.7         3.1          5.6         2.4  virginica
## 142          6.9         3.1          5.1         2.3  virginica
## 143          5.8         2.7          5.1         1.9  virginica
## 144          6.8         3.2          5.9         2.3  virginica
## 145          6.7         3.3          5.7         2.5  virginica
## 146          6.7         3.0          5.2         2.3  virginica
## 147          6.3         2.5          5.0         1.9  virginica
## 148          6.5         3.0          5.2         2.0  virginica
## 149          6.2         3.4          5.4         2.3  virginica
## 150          5.9         3.0          5.1         1.8  virginica

head(iris) # display first six rows of the iris data set

##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa

4 Plot a frequency histogram of your data and inspect histogram data

In first histogram petal length in iris dataframe is shown. On y-axis there is a frequency or number of observations in each bin.

length(iris$Petal.Length)# number of observations for petal length in data set iris

## [1] 150

histInformation <- hist(iris$Petal.Length) # display histogram of variable Petal.Length from iris data set

histInformation # displays information about histogram

## $breaks
##  [1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0
## 
## $counts
##  [1] 37 13  0  1  4 11 21 21 17 16  5  4
## 
## $density
##  [1] 0.49333333 0.17333333 0.00000000 0.01333333 0.05333333 0.14666667
##  [7] 0.28000000 0.28000000 0.22666667 0.21333333 0.06666667 0.05333333
## 
## $mids
##  [1] 1.25 1.75 2.25 2.75 3.25 3.75 4.25 4.75 5.25 5.75 6.25 6.75
## 
## $xname
## [1] "iris$Petal.Length"
## 
## $equidist
## [1] TRUE
## 
## attr(,"class")
## [1] "histogram"

5 Plot a density histogram of your data

The option freq=FALSE or prob= TRUE creates a plot based on probability densities rather than frequencies.

hist(iris$Petal.Length, freq=FALSE) 
hist(iris$Petal.Length, prob=TRUE)

# Histogram additional options

5.1 Changing the number of the bins (break OPTION)

Bin could be defined in histrogram with break option. Smaller bin result that distribution is more visible. However, to small bins result in presenting too much detail. The default value for breaks option in histogram is breaks = “Sturges”. One of the ways to calculate it is The Freedman-Diaconis rule stating h=2∗IQR∗n−1/3. In base R you could use hist(x,breaks=“FD”). The bins don’t correspond to exactly the number you put in, because of the way R runs its algorithm to break up the data.

hist(iris$Petal.Length, freq=FALSE,  breaks = 20)

hist(iris$Petal.Length, freq=FALSE,  breaks = 50)

hist(iris$Petal.Length, freq=FALSE,  breaks = "Sturges")

hist(iris$Petal.Length, freq=FALSE,  breaks = "FD")

hist(iris$Petal.Length, freq=FALSE,  breaks = seq(from=0, to=10, by=2))

hist(iris$Petal.Length, breaks = seq(min(iris$Petal.Length), max(iris$Petal.Length), length.out = 11)) # exact 10 bins - remember number of bins is n+1

5.2 Adding Y axis limits (ylim OPTION)

If you want to specify y axis limits you can do this with ylim option:

hist(iris$Petal.Length, freq=FALSE, ylim=c(0, 0.6))

5.3 Main title (main OPTION)

Main is the plotting option to put a title on a graph

hist(iris$Petal.Length, freq=FALSE, ylim=c(0, 0.6), main="Main title") # in one line

hist(iris$Petal.Length, freq=FALSE, ylim=c(0, 0.6), main="Main \n title") # in two lines

5.4 Add axis labeling (xlab, ylab OPTION)

hist(iris$Petal.Length, freq=FALSE, ylim=c(0, 0.6), main="Main title", xlab="Petal length \n (cm)", ylab="Density")

5.5 Y axis text orientation (las OPTION)

hist(iris$Petal.Length, freq=FALSE, ylim=c(0, 0.6), main="Main title", xlab="Petal length \n (cm)", ylab="Density", las=1)

5.6 Adding a density plot

hist(iris$Petal.Length, freq=FALSE, ylim=c(0, 0.6), main="Main title", xlab="Petal length \n (cm)", ylab="Density", las=1)
lines(density(iris$Petal.Length), col=2, lwd=3)

5.7 Adding normal distribution plot

PL <- iris$Petal.Length
hist(PL, prob=TRUE)
x<-seq(1,7,0.01) 
curve(dnorm(x, mean=mean(PL), sd=sd(PL)), add=TRUE)

5.8 Adding number of counts for each bin

hist(iris$Petal.Length, freq=FALSE, main="Main title", xlab="Petal length \n (cm)", ylab="Density", las=1, labels=TRUE, ylim=c(0, 0.6))

5.9 Adding median and mean to the histrogram

hist(iris$Petal.Length, freq=FALSE, breaks = 30, main = "Petal length histogram", xlab ="Petal length \n (cm)", ylim=c(0, 0.8), las=1, col="grey")
abline(v=c(mean(iris$Petal.Length), median(iris$Petal.Length)), lty=c(1,3), lwd =2) # lty = 1 (solid line), lty = 2 (dashed line), lty = 3 (dotted line)
legend("topright", legend=c("mean Petal length", "median Petal length"), lty=c(1,3), lwd =2)

mean(iris$Petal.Length)

## [1] 3.758

median(iris$Petal.Length)

## [1] 4.35

We can alter the way in which the breaks and hence bins are created and drawn.

par(mfrow=c(2,2)) # specify a 2x2 panel plot
hist(iris$Petal.Length,breaks="Sturges",main="Sturges Method (default)")
hist(iris$Petal.Length,breaks=30,main="30 bins")
hist(iris$Petal.Length,breaks=50,main="50 bins")
hist(iris$Petal.Length,breaks=seq(0,7,0.5),main="bins breaks every 0.5cm")

NOT INFORMATIVE WAY TO PRESENT DATA!

seq(0,7,0.5)

##  [1] 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0

hist(iris$Petal.Length,breaks=seq(0,7,0.5),main="bins breaks every 0.5cm")

par(mfrow=c(2,2)) # specify a 2x2 panel plot
hist(iris$Petal.Length[iris$Species=="setosa"],  main="setosa", xlab="Petal Length (cm)", ylab="Frequency")
hist(iris$Petal.Length[iris$Species=="versicolor"], main="versicolor", xlab="Petal Length (cm)", ylab="Frequency")
hist(iris$Petal.Length[iris$Species=="virginica"], main="virginica", xlab="Petal Length (cm)", ylab="Frequency")
iris$Species=="setosa"

##   [1]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
##  [12]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
##  [23]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
##  [34]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
##  [45]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE
##  [56] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [67] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [78] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [89] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [100] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [111] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [122] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [133] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [144] FALSE FALSE FALSE FALSE FALSE FALSE FALSE

NOTE == is interpreted as “is it equal to?”, not “is equal to”!

THE SAME DATA SHOULD BE PRESENTED LIKE THIS - same x and y axis to compare data distribution between species.

par(mfrow=c(3,1)) # specify a 3x1 panel plot

# Species == setosa
hist(iris$Petal.Length[iris$Species=="setosa"], breaks=seq(0,7,0.25),
      main="setosa", xlab="", ylab="", cex.lab=1.5, ylim=c(0, 40), las=1, col="lightgrey")

# Species == versicolor
hist(iris$Petal.Length[iris$Species=="versicolor"], breaks=seq(0,7,0.25),
      main="versicolor", xlab="", ylab="Frequency", ylim=c(0, 40), las=1, cex.lab=1.5, col="lightgrey")

# Species == virginica
hist(iris$Petal.Length[iris$Species=="virginica"], breaks=seq(0,7,0.25),
      main="virginica", xlab="Petal Length (cm)", ylab="", ylim=c(0, 40), las=1, cex.lab=1.5, col="lightgrey")

6 Save your histogram

If you want to save your image, you can save it as a image (png, jpeg, tiff, bmp, svg, eps), or pdf or save it to the clipboard as Save_histogram!

ASSIGNMENT 1:

From the MASS package create four histograms one above the other (use par function) for each litter (A, B, I and J) from the genotype data set. Each histogram should have density function, bins from 30 to 70 divided into length of 10 with labels and y - axis scaling from 0 to 0.2, x -axis label should be weight and colours for Litter A - green, Litter B - blue, Litter I - red and Litter J - brown.

Solution

library(MASS)
par(mfrow=c(3, 1))
hist(PlantGrowth$weight[PlantGrowth$group=="ctrl"], main="Group ctrl", freq=FALSE, breaks=seq(3,7,1),  labels=TRUE, ylim=c(0, 1), xlab="Weight",col="gray")
hist(PlantGrowth$weight[PlantGrowth$group=="trt1"], main="Group trt1", freq=FALSE, breaks=seq(3,7,1),  labels=TRUE, ylim=c(0, 1), xlab="Weight",col="gray")
hist(PlantGrowth$weight[PlantGrowth$group=="trt2"], main="Group trt2", freq=FALSE, breaks=seq(3,7,1),  labels=TRUE, ylim=c(0, 1), xlab="Weight",col="gray")

Lecture 5: Histograms

27 September 2017