My solution fo the Fun Series October - Learning Data Science - Linear Regression challenge

Let’s first read the CSV file into a data frame:

df <- read.table("input/pts.csv", header=T, sep=",", quote="")

There are 3 columns:

names(df)
## [1] "set" "x"   "y"

..and 20 sets:

str(factor(df$set))
##  Factor w/ 20 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...

Let’s build the dataframe where we will store the result:

result<-data.frame()

Next we are going to loop trough all the dataset and calculate a linear model using ‘lm()’

for(x in 1:20){
  #for each set
  set <- df[df$set == x,]
  
  #calculate linear model
  lm_set<-lm(x~y,data = set)
  
  #add the coefficients and the datatset number to the result
  result<-rbind(result,data.frame(x,coef(lm_set)[2],coef(lm_set)[1]))
 
}

Finally write the data frame to a CSV file

write.table(result, file = "result.csv",row.names=FALSE, na="",col.names=FALSE, sep=",")

Things to do: