Controls & Loops

These allow you to control the flow of execution of a script typically inside of a function. Common ones include:

  • if, else
  • for
  • while
  • repeat
  • break
  • next
  • function

Control statements

If statement

if (condition)
{
  # do something
} 

Sometimes, we want to do something even if the if-statement returns FALSE. In this case, we use an else statement.

If else statement

if (condition)
{
  # do something
} 
else 
{
  # do something else
}

Hadley Wickham has published a style guide for R http://adv-r.had.co.nz/Style.html.

“An opening curly brace should never go on its own line and should always be followed by a new line. A closing curly brace should always go on its own line, unless it’s followed by else. Always indent the code inside curly braces.”"

If else statements with better styling

if (condition) {
  # do something
} else {
  # do something else
}

An example of if..else statement:

x <- 1
if (x > 1) {
  print("x is greater than 1")
} else {
  print("x is less than or equal to 1")
} 
## [1] "x is less than or equal to 1"

If, else if and else statements

if (condition1) {
  # do something
} else if (condition2) {
  # do something else
} else {
  # do something different
}

For example:

x <- 1
if (x > 1) {
  print("x is greater than 1")
} else if ( x < 1) {
  print ("x is less than 1")
} else {
  print("x is equal to 1")
} 
## [1] "x is equal to 1"

Quiz

Write a set of if..else statements that will test a variable “mhi5_score”. Print out “depression” if it is less than 52 and “control” if it is greater than or equal to 52

mhi5_score <- 45
# put your if else statements here

Loops

For loops

A for-loop works on an iterable variable and assigns successive values till the end of a sequence.

for (i in 1:10) {
  print(i)
}
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9
## [1] 10
x <- c("apples", "oranges", "bananas", "strawberries")

for (i in 1:4) {
  print(x[i])
}
## [1] "apples"
## [1] "oranges"
## [1] "bananas"
## [1] "strawberries"

For short-hand coding, we can put the contents of the loop on the same line as the for-statement

for (i in 1:4) print(x[i])
## [1] "apples"
## [1] "oranges"
## [1] "bananas"
## [1] "strawberries"

We can also simply print out each element of our vector using a for-loop to traverse it

for (i in x) print(i)
## [1] "apples"
## [1] "oranges"
## [1] "bananas"
## [1] "strawberries"

For loop examples

We’ve seen that we can use for loops for vectors. We can also use for loops for data frames

# read in the gambling data from Day 1
gambling.data <- read.csv(file = "http://data.justice.qld.gov.au/JSD/OLGR/20170817_OLGR_LGA-EGM-data.csv",
                 header = TRUE,
                 sep = ",",
                 stringsAsFactors = FALSE)

# rename columns
names(gambling.data)[2] <- "Local.Govt.Area"
names(gambling.data)[7] <- "Player.Money.Lost"

#Add a day of month (1st) to each date string
date.string <- paste0( "1 " , gambling.data$Month.Year )

#Convert to POSIXlt, a date-time format
strptime( date.string , format = "%d %B %Y" ) -> gambling.data$Date

# subset to Brisbane only 
brisbane.only <- gambling.data[gambling.data$Local.Govt.Area=="BRISBANE",]
row.indicies <- (brisbane.only$Date>="2010-01-01 AEST" &
                 brisbane.only$Date<="2010-12-31 AEST")

(brisbane.2010.data <- brisbane.only[row.indicies,])
##          Month.Year Local.Govt.Area Approved.Sites Operational.Sites
## 3635   January 2010        BRISBANE            227               220
## 3690  February 2010        BRISBANE            227               220
## 3745     March 2010        BRISBANE            227               220
## 3800     April 2010        BRISBANE            227               221
## 3855       May 2010        BRISBANE            228               222
## 3910      June 2010        BRISBANE            227               222
## 3965      July 2010        BRISBANE            226               219
## 4020    August 2010        BRISBANE            226               218
## 4075 September 2010        BRISBANE            225               218
## 4130   October 2010        BRISBANE            225               218
## 4185  November 2010        BRISBANE            225               218
## 4240  December 2010        BRISBANE            225               217
##      Approved.EGMs Operational.EGMs Player.Money.Lost       Date
## 3635          9183             8834          31268720 2010-01-01
## 3690          9175             8854          30025451 2010-02-01
## 3745          9225             8859          32183381 2010-03-01
## 3800          9345             8956          32017037 2010-04-01
## 3855          9230             8815          32244843 2010-05-01
## 3910          9166             8872          31873072 2010-06-01
## 3965          9144             8809          36225638 2010-07-01
## 4020          9119             8791          36861039 2010-08-01
## 4075          9106             8812          34763792 2010-09-01
## 4130          9106             8799          36211785 2010-10-01
## 4185          9126             8830          33534227 2010-11-01
## 4240          9126             8797          35019142 2010-12-01

We can traverse across each row of the data frame

# for each row in brisbane.2010.data, print the player money lost
numrows <- nrow(brisbane.2010.data)
for (i in 1:numrows) {
  print(brisbane.2010.data[i, "Player.Money.Lost"])
}
## [1] 31268720
## [1] 30025451
## [1] 32183381
## [1] 32017037
## [1] 32244843
## [1] 31873072
## [1] 36225638
## [1] 36861039
## [1] 34763792
## [1] 36211785
## [1] 33534227
## [1] 35019142

Or across each column of the data frame

# for each column in brisbane.2010.data, print the data type using the class() function
numcols <- ncol(brisbane.2010.data)
for (i in 1:numcols) {
  columndata <- brisbane.2010.data[, i]
  print(class(columndata))
}
## [1] "character"
## [1] "character"
## [1] "integer"
## [1] "integer"
## [1] "integer"
## [1] "integer"
## [1] "numeric"
## [1] "POSIXlt" "POSIXt"

For loops and if statements

We can incorporate our knowledge of if..else statements with for-loops.

# for each row, print those where the Player.Money.Lost is greater than 32 million
numrows <- nrow(brisbane.2010.data)
for (i in 1:numrows) {
  if (brisbane.2010.data[i, "Player.Money.Lost"] > 32000000){
    print(brisbane.2010.data[i, "Player.Money.Lost"])
  }
}
## [1] 32183381
## [1] 32017037
## [1] 32244843
## [1] 36225638
## [1] 36861039
## [1] 34763792
## [1] 36211785
## [1] 33534227
## [1] 35019142

Quiz

For each row of the brisbane.2010.data dataframe, write code to test whether the Player.Money.Lost value is greater than 32 million. If so, print “Greater than 32 million”. Otherwise, print “Not greater than 32 million”

numrows <- nrow(brisbane.2010.data)
for (i in 1:numrows) {
  # write your code here 
}

While loop

i <- 1
while (i < 10) {
  print(i)
  i <- i + 1
}
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9

Make sure there is a way to exit out of a while loop. Otherwise, you can get stuck in an infinite loop. Below is an example of one.

i <- 1
while (i < 10) {
  print(i)
}

Break statement

A break statement is used in a loop to stop the iterations and flow the control outside of the loop.

x = 1:10 
for (i in x){ 
  if (i == 2){ 
    break 
  }
  print(i)
}
## [1] 1

Next statement

Next statement enables to skip the current iteration of a loop without terminating it.

x = 1:4 
for (i in x) { 
  if (i == 2){ 
    next
  }
  print(i)
}
## [1] 1
## [1] 3
## [1] 4

User-defined functions

If you have to repeat the same few lines of code more than once, then you really need to write a function. Functions are a fundamental building block of R. You use them all the time in R and it’s not that much harder to string functions together (or write entirely new ones from scratch) to do more.

  • R functions are objects just like anything else.
  • By default, R function arguments are lazy - they’re only evaluated if they’re actually used:
  • Every call on a R object is almost always a function call.

Basic components of a function

  • The body(), the code inside the function.
  • The formals(), the “formal” argument list, which controls how you can call the function.
  • The environment() which determines how variables referred to inside the function are found.
  • args() to list arguments.
myFunction <- function(parameter1, parameter2) {
  # do something with parameter1 and parameter2
  value <- parameter1 + parameter2
  return(value)
}

Functions should have good descriptive names

# Create a function to perform addition
add <- function(num1, num2) {
  return(num1 + num2)
}

# Call the add function
add(2,5)
## [1] 7

Example 1 : Loop through brisbane.2010.data and sum up the Player.Money.Lost

total.money.lost <- 0

for (i in 1:nrow(brisbane.2010.data)) {
  total.money.lost <- total.money.lost + brisbane.2010.data[i, "Player.Money.Lost"]
}

print(total.money.lost)
## [1] 402228128

Example 2 : Create a function to sum up the Player.Money.Lost column from any data frame

sumMoneyLost <- function(some.gambling.dataframe){
  
  total.money.lost <- 0

  for (i in 1:nrow(some.gambling.dataframe)) {
    total.money.lost <- total.money.lost + some.gambling.dataframe[i, "Player.Money.Lost"]
  }
  return(total.money.lost)
}

# Call function sum.money.lost
sumMoneyLost(brisbane.2010.data)
## [1] 402228128

With user-defined functions, you won’t need to repeat your code over and over again. You can just call it for another dataset

sumMoneyLost(brisbane.only)
## [1] 5624593215

If, however, there is already a function to perform what you need, save time and use that instead.

# This will return NA
sumMoneyLost(gambling.data)
## [1] NA
# We will use the sum() function instead which has a na.rm option
sum(gambling.data$Player.Money.Lost, na.rm = TRUE)
## [1] 25193280820

Adapted from :

https://ramnathv.github.io/pycon2014-r/learn/controls.html

https://www.r-bloggers.com/control-structures-loops-in-r/

Graphics

Introduction to ggplot2

  • ggplot2 is a data visualization package for the statistical programming language R.
  • It was created by Hadley Wickham in 2005 (while he was a graduate student at Iowa state).
  • It is based on Grammar of Graphics (Leland Wilkinson 2005), is composed of a set of independent components that can be composed in many different ways.
  • Other data visualization packages in R include base and lattice.

Grammar Of Graphics

Grammar of English language has components like verbs, nouns, adjectives, articles, etc to form a sentence.

Grammar Of Graphics independently specifies plot building blocks and combine them to create just about any kind of graphical display you want. Building blocks of a graph include:

  • data
  • aesthetic mapping
  • geometric object
  • statistical transformations
  • scales
  • coordinate system
  • position adjustments
  • faceting

Advantages of ggplot2

  • consistent underlying grammar of graphics (Wilkinson, 2005)
  • plot specification at a high level of abstraction
  • very flexible
  • theme system for polishing plot appearance
  • mature and complete graphics system
  • many users, active mailing list

What ggplot2 cannot do?

  • 3D visualizations (see the “rgl” package)
  • graph theory type of graphs (node/edges; see “igraph” package)
  • interactive graphics (see “ggvis” package)

Key components

Every ggplot2 plot has three key components:

  1. data,
  2. A set of aesthetic mappings between variables in the data and visual properties, and
  3. At least one layer which describes how to render each observation. Layers are usually created with a geom function.

Website : http://ggplot2.org

library(ggplot2)
library(scales)
## [1] "data.frame"

Create Aggregate Sets

Aggregate gambling.data by Local.Govt.Area & by Year

gambling.data <- na.omit(gambling.data)

# Aggregate dataset by taking mean of Player.Money.Lost By each Year

gambling.avg.year <- aggregate(gambling.data$Player.Money.Lost, 
                         by=list(Year = gambling.data$Year), 
                         FUN=mean,
                         na.rm = TRUE)
names(gambling.avg.year)[names(gambling.avg.year) == 'x'] <- 'Avg.Money.Lost'

# Aggregate dataset by taking mean of Player.Money.Lost By Local.Govt.Area

gambling.avg.LGA <- aggregate(gambling.data$Player.Money.Lost, 
                         by=list(Local.Govt.Area = gambling.data$Local.Govt.Area), 
                         FUN=mean,
                         na.rm = TRUE)

names(gambling.avg.LGA)[names(gambling.avg.LGA) == 'x'] <- 'Avg.Money.Lost'

Scatter plot

Let’s take a look at a simple example of scatter plot.

Basic Scatter plot

# Scatter plot of Year and operational sites 
ggplot(gambling.avg.year,
       aes(x=Year,
           y=Avg.Money.Lost
           )) +
  geom_point() 

This produces a scatterplot defined by:

  1. Data: gambling.avg.year
  2. Aesthetic mapping: Date mapped to x position, Operational.Sites to y position (aes() function).
  3. Layer: points (geom_point() function)

Scatter plot with colour, title

  • Here colour=“red” represents colour of points

  • ggtitle() represents title of the plot

# Scatter plot of Year and operational sites with colour attribute for points.

ggplot(gambling.avg.year,
       aes(x=Year,
           y=Avg.Money.Lost)) +
  geom_point(colour="red") +
  ggtitle("Year and Average Money Lost")

Scatter plot with smoothing layer

Now we add a smoothing layer using geom_smooth(method=‘lm’). Since the method is set as lm (short for linear model), it draws the line of best fit. The line of best fit is in blue.

# Scatter plot of Year and operational sites (with) geom_smooth()
ggplot(gambling.avg.year,
       aes(x=Year,
           y=Avg.Money.Lost)) +
  geom_point(colour="red") +
  ggtitle("Year and Average Money Lost")

  geom_smooth(method='lm') 
## geom_smooth: na.rm = FALSE
## stat_smooth: na.rm = FALSE, method = lm, formula = y ~ x, se = TRUE
## position_identity

Barplot

Let’s create a barplot with average money lost for each Local.Govt.Area using whole gambling dataset.

  • Here, if you want the heights of the bars to represent values in the data, use stat=“identity” and map a value to the y aesthetic.

  • theme() allows you to control apperance of all non-data components.

  • axis.title.x means x axis label

  • element_text means how text elements should be specified

  • angle=90 means x axis labels are rotated at 90 degrees.

  • hjust=1 means all x axis labels are aligned at top.

# Barplot of money lost(average) for each LGA
ggplot(gambling.avg.LGA,
       aes(x=Local.Govt.Area,
           y=Avg.Money.Lost)) +
geom_bar(stat="identity") +
theme(axis.text.x = element_text(angle=90,hjust=1)) 

Barplot with ordering

  • Here reorder() function is used to order first argument (Local.Govt.Area) based on second argument (Avg.Money.Lost). Negative sign in front of Avg.Money.Lost means ordering in decreasing values.

  • fill=“blue” colours all bars in specified colour.

  • scale_y_continuous(labels = dollar) scales y axis values as continuous values with label as dollar sign.

ggplot(gambling.avg.LGA,
       aes(x=reorder(Local.Govt.Area,-Avg.Money.Lost),
           y=Avg.Money.Lost)) +
      geom_bar(stat="identity", fill="blue") + 
      theme(axis.text.x = element_text(angle=90,hjust=1)) +
      scale_y_continuous(labels = dollar) + 
      xlab("Local Government Area") +
      ylab("Total Money lost")

Box plot

# Box plots for whole dataset by LGA

ggplot(gambling.data, aes(x = reorder(Local.Govt.Area, -Player.Money.Lost, FUN = median), 
                          y = Player.Money.Lost)) +
        geom_boxplot() +
        scale_y_continuous(labels = dollar) + 
        theme(axis.text.x = element_text(angle=90,hjust=1)) +
        xlab("Local Government Area") +
        ylab("Total Money lost")

Save plot

png("myplot.png")

myplot <- ggplot(gambling.avg.LGA,
           aes(x=reorder(Local.Govt.Area,-Avg.Money.Lost),
               y=Avg.Money.Lost)) +
          geom_bar(stat="identity", fill="blue") + 
          theme(axis.text.x = element_text(angle=90,hjust=1)) +
          scale_y_continuous(labels = dollar) + 
          xlab("Local Government Area") +
          ylab("Total Money lost")

print(myplot)

dev.off()
## quartz_off_screen 
##                 2

Adapted from :

http://ggplot2.org (Book)

http://tutorials.iq.harvard.edu/R/Rgraphics/Rgraphics.html