These allow you to control the flow of execution of a script typically inside of a function. Common ones include:
if (condition)
{
# do something
}
Sometimes, we want to do something even if the if-statement returns FALSE. In this case, we use an else statement.
if (condition)
{
# do something
}
else
{
# do something else
}
Hadley Wickham has published a style guide for R http://adv-r.had.co.nz/Style.html.
“An opening curly brace should never go on its own line and should always be followed by a new line. A closing curly brace should always go on its own line, unless it’s followed by else. Always indent the code inside curly braces.”"
if (condition) {
# do something
} else {
# do something else
}
An example of if..else statement:
x <- 1
if (x > 1) {
print("x is greater than 1")
} else {
print("x is less than or equal to 1")
}
## [1] "x is less than or equal to 1"
if (condition1) {
# do something
} else if (condition2) {
# do something else
} else {
# do something different
}
For example:
x <- 1
if (x > 1) {
print("x is greater than 1")
} else if ( x < 1) {
print ("x is less than 1")
} else {
print("x is equal to 1")
}
## [1] "x is equal to 1"
Write a set of if..else statements that will test a variable “mhi5_score”. Print out “depression” if it is less than 52 and “control” if it is greater than or equal to 52
mhi5_score <- 45
# put your if else statements here
A for-loop works on an iterable variable and assigns successive values till the end of a sequence.
for (i in 1:10) {
print(i)
}
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9
## [1] 10
x <- c("apples", "oranges", "bananas", "strawberries")
for (i in 1:4) {
print(x[i])
}
## [1] "apples"
## [1] "oranges"
## [1] "bananas"
## [1] "strawberries"
For short-hand coding, we can put the contents of the loop on the same line as the for-statement
for (i in 1:4) print(x[i])
## [1] "apples"
## [1] "oranges"
## [1] "bananas"
## [1] "strawberries"
We can also simply print out each element of our vector using a for-loop to traverse it
for (i in x) print(i)
## [1] "apples"
## [1] "oranges"
## [1] "bananas"
## [1] "strawberries"
We’ve seen that we can use for loops for vectors. We can also use for loops for data frames
# read in the gambling data from Day 1
gambling.data <- read.csv(file = "http://data.justice.qld.gov.au/JSD/OLGR/20170817_OLGR_LGA-EGM-data.csv",
header = TRUE,
sep = ",",
stringsAsFactors = FALSE)
# rename columns
names(gambling.data)[2] <- "Local.Govt.Area"
names(gambling.data)[7] <- "Player.Money.Lost"
#Add a day of month (1st) to each date string
date.string <- paste0( "1 " , gambling.data$Month.Year )
#Convert to POSIXlt, a date-time format
strptime( date.string , format = "%d %B %Y" ) -> gambling.data$Date
# subset to Brisbane only
brisbane.only <- gambling.data[gambling.data$Local.Govt.Area=="BRISBANE",]
row.indicies <- (brisbane.only$Date>="2010-01-01 AEST" &
brisbane.only$Date<="2010-12-31 AEST")
(brisbane.2010.data <- brisbane.only[row.indicies,])
## Month.Year Local.Govt.Area Approved.Sites Operational.Sites
## 3635 January 2010 BRISBANE 227 220
## 3690 February 2010 BRISBANE 227 220
## 3745 March 2010 BRISBANE 227 220
## 3800 April 2010 BRISBANE 227 221
## 3855 May 2010 BRISBANE 228 222
## 3910 June 2010 BRISBANE 227 222
## 3965 July 2010 BRISBANE 226 219
## 4020 August 2010 BRISBANE 226 218
## 4075 September 2010 BRISBANE 225 218
## 4130 October 2010 BRISBANE 225 218
## 4185 November 2010 BRISBANE 225 218
## 4240 December 2010 BRISBANE 225 217
## Approved.EGMs Operational.EGMs Player.Money.Lost Date
## 3635 9183 8834 31268720 2010-01-01
## 3690 9175 8854 30025451 2010-02-01
## 3745 9225 8859 32183381 2010-03-01
## 3800 9345 8956 32017037 2010-04-01
## 3855 9230 8815 32244843 2010-05-01
## 3910 9166 8872 31873072 2010-06-01
## 3965 9144 8809 36225638 2010-07-01
## 4020 9119 8791 36861039 2010-08-01
## 4075 9106 8812 34763792 2010-09-01
## 4130 9106 8799 36211785 2010-10-01
## 4185 9126 8830 33534227 2010-11-01
## 4240 9126 8797 35019142 2010-12-01
We can traverse across each row of the data frame
# for each row in brisbane.2010.data, print the player money lost
numrows <- nrow(brisbane.2010.data)
for (i in 1:numrows) {
print(brisbane.2010.data[i, "Player.Money.Lost"])
}
## [1] 31268720
## [1] 30025451
## [1] 32183381
## [1] 32017037
## [1] 32244843
## [1] 31873072
## [1] 36225638
## [1] 36861039
## [1] 34763792
## [1] 36211785
## [1] 33534227
## [1] 35019142
Or across each column of the data frame
# for each column in brisbane.2010.data, print the data type using the class() function
numcols <- ncol(brisbane.2010.data)
for (i in 1:numcols) {
columndata <- brisbane.2010.data[, i]
print(class(columndata))
}
## [1] "character"
## [1] "character"
## [1] "integer"
## [1] "integer"
## [1] "integer"
## [1] "integer"
## [1] "numeric"
## [1] "POSIXlt" "POSIXt"
We can incorporate our knowledge of if..else statements with for-loops.
# for each row, print those where the Player.Money.Lost is greater than 32 million
numrows <- nrow(brisbane.2010.data)
for (i in 1:numrows) {
if (brisbane.2010.data[i, "Player.Money.Lost"] > 32000000){
print(brisbane.2010.data[i, "Player.Money.Lost"])
}
}
## [1] 32183381
## [1] 32017037
## [1] 32244843
## [1] 36225638
## [1] 36861039
## [1] 34763792
## [1] 36211785
## [1] 33534227
## [1] 35019142
For each row of the brisbane.2010.data dataframe, write code to test whether the Player.Money.Lost value is greater than 32 million. If so, print “Greater than 32 million”. Otherwise, print “Not greater than 32 million”
numrows <- nrow(brisbane.2010.data)
for (i in 1:numrows) {
# write your code here
}
i <- 1
while (i < 10) {
print(i)
i <- i + 1
}
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9
Make sure there is a way to exit out of a while loop. Otherwise, you can get stuck in an infinite loop. Below is an example of one.
i <- 1
while (i < 10) {
print(i)
}
A break statement is used in a loop to stop the iterations and flow the control outside of the loop.
x = 1:10
for (i in x){
if (i == 2){
break
}
print(i)
}
## [1] 1
Next statement enables to skip the current iteration of a loop without terminating it.
x = 1:4
for (i in x) {
if (i == 2){
next
}
print(i)
}
## [1] 1
## [1] 3
## [1] 4
If you have to repeat the same few lines of code more than once, then you really need to write a function. Functions are a fundamental building block of R. You use them all the time in R and it’s not that much harder to string functions together (or write entirely new ones from scratch) to do more.
myFunction <- function(parameter1, parameter2) {
# do something with parameter1 and parameter2
value <- parameter1 + parameter2
return(value)
}
Functions should have good descriptive names
# Create a function to perform addition
add <- function(num1, num2) {
return(num1 + num2)
}
# Call the add function
add(2,5)
## [1] 7
total.money.lost <- 0
for (i in 1:nrow(brisbane.2010.data)) {
total.money.lost <- total.money.lost + brisbane.2010.data[i, "Player.Money.Lost"]
}
print(total.money.lost)
## [1] 402228128
sumMoneyLost <- function(some.gambling.dataframe){
total.money.lost <- 0
for (i in 1:nrow(some.gambling.dataframe)) {
total.money.lost <- total.money.lost + some.gambling.dataframe[i, "Player.Money.Lost"]
}
return(total.money.lost)
}
# Call function sum.money.lost
sumMoneyLost(brisbane.2010.data)
## [1] 402228128
With user-defined functions, you won’t need to repeat your code over and over again. You can just call it for another dataset
sumMoneyLost(brisbane.only)
## [1] 5624593215
If, however, there is already a function to perform what you need, save time and use that instead.
# This will return NA
sumMoneyLost(gambling.data)
## [1] NA
# We will use the sum() function instead which has a na.rm option
sum(gambling.data$Player.Money.Lost, na.rm = TRUE)
## [1] 25193280820
Adapted from :
Grammar of English language has components like verbs, nouns, adjectives, articles, etc to form a sentence.
Grammar Of Graphics independently specifies plot building blocks and combine them to create just about any kind of graphical display you want. Building blocks of a graph include:
Every ggplot2 plot has three key components:
Website : http://ggplot2.org
library(ggplot2)
library(scales)
## [1] "data.frame"
Aggregate gambling.data by Local.Govt.Area & by Year
gambling.data <- na.omit(gambling.data)
# Aggregate dataset by taking mean of Player.Money.Lost By each Year
gambling.avg.year <- aggregate(gambling.data$Player.Money.Lost,
by=list(Year = gambling.data$Year),
FUN=mean,
na.rm = TRUE)
names(gambling.avg.year)[names(gambling.avg.year) == 'x'] <- 'Avg.Money.Lost'
# Aggregate dataset by taking mean of Player.Money.Lost By Local.Govt.Area
gambling.avg.LGA <- aggregate(gambling.data$Player.Money.Lost,
by=list(Local.Govt.Area = gambling.data$Local.Govt.Area),
FUN=mean,
na.rm = TRUE)
names(gambling.avg.LGA)[names(gambling.avg.LGA) == 'x'] <- 'Avg.Money.Lost'
Let’s take a look at a simple example of scatter plot.
# Scatter plot of Year and operational sites
ggplot(gambling.avg.year,
aes(x=Year,
y=Avg.Money.Lost
)) +
geom_point()
This produces a scatterplot defined by:
Here colour=“red” represents colour of points
ggtitle() represents title of the plot
# Scatter plot of Year and operational sites with colour attribute for points.
ggplot(gambling.avg.year,
aes(x=Year,
y=Avg.Money.Lost)) +
geom_point(colour="red") +
ggtitle("Year and Average Money Lost")
Now we add a smoothing layer using geom_smooth(method=‘lm’). Since the method is set as lm (short for linear model), it draws the line of best fit. The line of best fit is in blue.
# Scatter plot of Year and operational sites (with) geom_smooth()
ggplot(gambling.avg.year,
aes(x=Year,
y=Avg.Money.Lost)) +
geom_point(colour="red") +
ggtitle("Year and Average Money Lost")
geom_smooth(method='lm')
## geom_smooth: na.rm = FALSE
## stat_smooth: na.rm = FALSE, method = lm, formula = y ~ x, se = TRUE
## position_identity
Let’s create a barplot with average money lost for each Local.Govt.Area using whole gambling dataset.
Here, if you want the heights of the bars to represent values in the data, use stat=“identity” and map a value to the y aesthetic.
theme() allows you to control apperance of all non-data components.
axis.title.x means x axis label
element_text means how text elements should be specified
angle=90 means x axis labels are rotated at 90 degrees.
hjust=1 means all x axis labels are aligned at top.
# Barplot of money lost(average) for each LGA
ggplot(gambling.avg.LGA,
aes(x=Local.Govt.Area,
y=Avg.Money.Lost)) +
geom_bar(stat="identity") +
theme(axis.text.x = element_text(angle=90,hjust=1))
Here reorder() function is used to order first argument (Local.Govt.Area) based on second argument (Avg.Money.Lost). Negative sign in front of Avg.Money.Lost means ordering in decreasing values.
fill=“blue” colours all bars in specified colour.
scale_y_continuous(labels = dollar) scales y axis values as continuous values with label as dollar sign.
ggplot(gambling.avg.LGA,
aes(x=reorder(Local.Govt.Area,-Avg.Money.Lost),
y=Avg.Money.Lost)) +
geom_bar(stat="identity", fill="blue") +
theme(axis.text.x = element_text(angle=90,hjust=1)) +
scale_y_continuous(labels = dollar) +
xlab("Local Government Area") +
ylab("Total Money lost")
# Box plots for whole dataset by LGA
ggplot(gambling.data, aes(x = reorder(Local.Govt.Area, -Player.Money.Lost, FUN = median),
y = Player.Money.Lost)) +
geom_boxplot() +
scale_y_continuous(labels = dollar) +
theme(axis.text.x = element_text(angle=90,hjust=1)) +
xlab("Local Government Area") +
ylab("Total Money lost")
png("myplot.png")
myplot <- ggplot(gambling.avg.LGA,
aes(x=reorder(Local.Govt.Area,-Avg.Money.Lost),
y=Avg.Money.Lost)) +
geom_bar(stat="identity", fill="blue") +
theme(axis.text.x = element_text(angle=90,hjust=1)) +
scale_y_continuous(labels = dollar) +
xlab("Local Government Area") +
ylab("Total Money lost")
print(myplot)
dev.off()
## quartz_off_screen
## 2
Adapted from :
http://ggplot2.org (Book)