The aim of this document is to provide all the analyses for Experiment 2 Overlap all 3. In this experiment, participants were presented with displays of 4 words, among which were the target, a high orthographic competitor, a low orthographic competitor, and a distractor. The structure of the document will be as follows:
Data Cleaning and Pre-processing
Exploratory Analyses
Growth Curve Analysis
Graphs
Load the required libraries
library(ggplot2)
library(lme4)
library(lmerTest)
library(dplyr)
library(parallel)
library(doParallel)
library(tidyr)
library(gridExtra)
library(caret)
First, I load the data file indicating the trials that were removed for each subject.
deletedTrials<-read.csv("deleted.csv")
Now I load up the csv files with Exp1 data. There are 4 files (Exp2Part1 and Exp2Part2, Exp2Part3 and sub 11spe2). Not all colums in those files are required. Only the following colums will be loaded in R, so as to preserve memory space. The colums are: RECORDING_SESSION_LABEL, IA_FIRST_RUN_START_TIME, IA_FIRST_RUN_START_TIME, IA_FIRST_RUN_END_TIME, IA_SECOND_RUN_START_TIME, IA_SECOND_RUN_END_TIME, IA_THIRD_RUN_START_TIME, IA_THIRD_RUN_END_TIME, IA_LABEL, RESPONSE_ACC, IA_DWELL_TIME, TRIAL_INDEX, target, RESPONSE_RT,trialtype,target, list, oldnewitem
Part1 <- read.csv("Exp2Part1.csv", na.strings = c("."))[,c('RECORDING_SESSION_LABEL', 'IA_FIRST_RUN_START_TIME', 'IA_FIRST_RUN_START_TIME', 'IA_FIRST_RUN_END_TIME', 'IA_SECOND_RUN_START_TIME', 'IA_SECOND_RUN_END_TIME', 'IA_THIRD_RUN_START_TIME', 'IA_THIRD_RUN_END_TIME', 'IA_LABEL', 'RESPONSE_ACC', 'IA_DWELL_TIME', 'TRIAL_INDEX', 'target', 'RESPONSE_RT', 'trialtype','oldnewitem', 'MOUSE_TIME','DISPLAY_TIME')]
#now load up part 2
Part2 <- read.csv("Exp2Part2.csv", na.strings = c("."))[,c('RECORDING_SESSION_LABEL', 'IA_FIRST_RUN_START_TIME', 'IA_FIRST_RUN_START_TIME', 'IA_FIRST_RUN_END_TIME', 'IA_SECOND_RUN_START_TIME', 'IA_SECOND_RUN_END_TIME', 'IA_THIRD_RUN_START_TIME', 'IA_THIRD_RUN_END_TIME', 'IA_LABEL', 'RESPONSE_ACC', 'IA_DWELL_TIME', 'TRIAL_INDEX', 'target', 'RESPONSE_RT', 'trialtype','oldnewitem', 'MOUSE_TIME','DISPLAY_TIME')]
Part3 <- read.csv("Exp2Part3.csv", na.strings = c("."))[,c('RECORDING_SESSION_LABEL', 'IA_FIRST_RUN_START_TIME', 'IA_FIRST_RUN_START_TIME', 'IA_FIRST_RUN_END_TIME', 'IA_SECOND_RUN_START_TIME', 'IA_SECOND_RUN_END_TIME', 'IA_THIRD_RUN_START_TIME', 'IA_THIRD_RUN_END_TIME', 'IA_LABEL', 'RESPONSE_ACC', 'IA_DWELL_TIME', 'TRIAL_INDEX', 'target', 'RESPONSE_RT', 'trialtype','oldnewitem', 'MOUSE_TIME','DISPLAY_TIME')]
#combine part 1 and part 2 into one dataframe
EyeData<-rbind(Part1, Part2, Part3)
Right now we have a dataset with all the subjects that took part in Experiment 2. In the code below, I perform some preprocessing that aims to do the following: Remove bad trials (e.g., trials with a lot of off-screen fixations or trials in which the fixations don’t start at the center of the screen or trials in which the tracking of the eye was not very good. The decision about which trials should be removed was done by two graduate students, Stas and Julie.) The deleted trials can be found in the deletedTrials file.
#this part of the code is not very effective; need to figure out later how to optimize it.
EyeData<-EyeData[!c(EyeData$RECORDING_SESSION_LABEL=="17spe2" & EyeData$TRIAL_INDEX==45),]
EyeData<-EyeData[!c(EyeData$RECORDING_SESSION_LABEL=="15spe2" & EyeData$TRIAL_INDEX==85),]
EyeData<-EyeData[!c(EyeData$RECORDING_SESSION_LABEL=="9sre2" & EyeData$TRIAL_INDEX==21),]
EyeData<-EyeData[!c(EyeData$RECORDING_SESSION_LABEL=="1tle2" & EyeData$TRIAL_INDEX==69),]
EyeData<-EyeData[!c(EyeData$RECORDING_SESSION_LABEL=="9spe2" & EyeData$TRIAL_INDEX==18),]
20/12652
## [1] 0.001580778
Only 5 trials have been deleted, which constitutes less than .158% of the total number of trials.
In the code below, I transform the NAs to zero, calculate the RT from the onset of the word display till the mouse click time is made.
#change all the NAs to zeros
EyeData[is.na(EyeData)] <- 0
#Calculate RT from the display onset until the mouse click is made
EyeData$RESPONSE_RT<-EyeData$MOUSE_TIME-EyeData$DISPLAY_TIME
#check the dimentions of the dataset
dim(EyeData)
## [1] 12652 18
#remove the MOUSE_TIME and DISPLAY_TIME variables
EyeData<-EyeData %>% select (1:16)
#rename the RECORDING_SESSION_LABEL into Subject
colnames(EyeData)[1] <- "Subject"
#examine colum names
names(EyeData)
## [1] "Subject" "IA_FIRST_RUN_START_TIME"
## [3] "IA_FIRST_RUN_START_TIME.1" "IA_FIRST_RUN_END_TIME"
## [5] "IA_SECOND_RUN_START_TIME" "IA_SECOND_RUN_END_TIME"
## [7] "IA_THIRD_RUN_START_TIME" "IA_THIRD_RUN_END_TIME"
## [9] "IA_LABEL" "RESPONSE_ACC"
## [11] "IA_DWELL_TIME" "TRIAL_INDEX"
## [13] "target" "RESPONSE_RT"
## [15] "trialtype" "oldnewitem"
Examine the structure of the dataframe
str(EyeData)
## 'data.frame': 12652 obs. of 16 variables:
## $ Subject : Factor w/ 36 levels "10spe2","11spe2",..: 8 8 8 8 8 8 8 8 8 8 ...
## $ IA_FIRST_RUN_START_TIME : num 2775 2497 0 0 2974 ...
## $ IA_FIRST_RUN_START_TIME.1: num 2775 2497 0 0 2974 ...
## $ IA_FIRST_RUN_END_TIME : num 3962 2726 0 0 4030 ...
## $ IA_SECOND_RUN_START_TIME : num 0 0 0 0 0 0 0 0 0 0 ...
## $ IA_SECOND_RUN_END_TIME : num 0 0 0 0 0 0 0 0 0 0 ...
## $ IA_THIRD_RUN_START_TIME : num 0 0 0 0 0 0 0 0 0 0 ...
## $ IA_THIRD_RUN_END_TIME : num 0 0 0 0 0 0 0 0 0 0 ...
## $ IA_LABEL : Factor w/ 4 levels "CompetitorHigh ",..: 3 1 2 4 3 1 2 4 3 1 ...
## $ RESPONSE_ACC : int 1 1 1 1 1 1 1 1 1 1 ...
## $ IA_DWELL_TIME : int 1170 210 0 0 1057 223 451 176 950 0 ...
## $ TRIAL_INDEX : int 1 1 1 1 2 2 2 2 3 3 ...
## $ target : Factor w/ 88 levels "bait","bead",..: 41 41 41 41 8 8 8 8 63 63 ...
## $ RESPONSE_RT : num 2249 2249 2249 2249 2312 ...
## $ trialtype : Factor w/ 4 levels "Exp","FillerRel",..: 4 4 4 4 4 4 4 4 4 4 ...
## $ oldnewitem : Factor w/ 3 levels "New","Old","x": 3 3 3 3 3 3 3 3 3 3 ...
Subset the experimental trials. This will remove all other trialtypes, such as FillerRel, FillerUnrel and Practice trials
EyeData <- EyeData %>% filter(trialtype=="Exp")
Check subject accuracy
SubjectAccuracy<- EyeData %>% group_by(Subject) %>%
summarise(MeanAccuracy=mean(RESPONSE_ACC)) %>%
arrange(MeanAccuracy)
SubjectAccuracy
## Source: local data frame [36 x 2]
##
## Subject MeanAccuracy
## (fctr) (dbl)
## 1 3spe2 0.8888889
## 2 8sre2 0.9259259
## 3 15spe2 0.9615385
## 4 14spe2 0.9629630
## 5 10spe2 1.0000000
## 6 11spe2 1.0000000
## 7 1spe2 1.0000000
## 8 1sse2 1.0000000
## 9 2sre2 1.0000000
## 10 2sse2 1.0000000
## .. ... ...
mean(SubjectAccuracy$MeanAccuracy)
## [1] 0.9927588
Subject accuracy is very high. Allmost all the subjects have 100% accuracy. Only 4 participants have a few incorrect trials. One subject however has relatively low accuracy (.89)
Create several variables that examine individual differences. Divide the subjects into fast or slow based on their RT performance for experimental trials. |
r # library(plyr) # EyeData<- ddply(EyeData, c("Subject"), # transform, SubjectRT=mean(RESPONSE_RT)) # # #create a new variable that has the levels fast or slow. Participants below the median are fast and above are slow # #the median is # EyeData$SubjectType<-ifelse(EyeData$SubjectRT>=median(EyeData$SubjectRT), "Slow","Fast") # detach("package:plyr") |
Remove trials with incorrect accuracy
CorrectEyeData<-EyeData %>% filter(RESPONSE_ACC==1)
#number of rows deleted
nrow(EyeData)-nrow(CorrectEyeData) #28 rows get deleted, which corresponds to 7 incorrect trials
## [1] 28
28/3868 #.72% of experimental trials
## [1] 0.007238883
Set up the start time and the end time dummy-coding of each run in the interest area. This creates colums where if a fixation exists during a run, then the run gets 1 (exists) or if there is no fixation in the IA during a run, then the coding is 0 (not exist)
CorrectEyeData$Fststart <- ifelse(as.numeric(as.character(CorrectEyeData$IA_FIRST_RUN_START_TIME)) > 0, as.numeric(as.character(CorrectEyeData$IA_FIRST_RUN_START_TIME)), 0)
CorrectEyeData$Fstend <- ifelse(as.numeric(as.character(CorrectEyeData$IA_FIRST_RUN_START_TIME)) > 0, as.numeric(as.character(CorrectEyeData$IA_FIRST_RUN_END_TIME)), 0)
CorrectEyeData$Secstart <- ifelse(as.numeric(as.character(CorrectEyeData$IA_SECOND_RUN_START_TIME)) > 0, as.numeric(as.character(CorrectEyeData$IA_SECOND_RUN_START_TIME)), 0)
CorrectEyeData$Secend <- ifelse(as.numeric(as.character(CorrectEyeData$IA_SECOND_RUN_START_TIME)) > 0, as.numeric(as.character(CorrectEyeData$IA_SECOND_RUN_END_TIME)), 0)
CorrectEyeData$Thirdstart <- ifelse(as.numeric(as.character(CorrectEyeData$IA_THIRD_RUN_START_TIME)) > 0, as.numeric(as.character(CorrectEyeData$IA_THIRD_RUN_START_TIME)), 0)
CorrectEyeData$Thirdend <- ifelse(as.numeric(as.character(CorrectEyeData$IA_THIRD_RUN_START_TIME)) > 0, as.numeric(as.character(CorrectEyeData$IA_THIRD_RUN_END_TIME)), 0)
Generate time bins from time 0 to time 6000 in 25 ms bins and assign them to the dataset
time <- seq(0, 6000, by=25)
tmatrix <- matrix(nrow=nrow(CorrectEyeData), ncol=length(time))
dim(tmatrix)
## [1] 3840 241
Generate time vectors for each row and column for first, second, and third pass viewings so that viewing periods receive a viewing probability value of 1
registerDoParallel(3)
for(i in 1:nrow(tmatrix)) {
for(j in 1:length(time)) {
tmatrix[i,j] <- ifelse(CorrectEyeData$Fststart[i] < time[j] &
CorrectEyeData$Fstend[i] > time[j] |CorrectEyeData$Secstart[i] <
time[j] & CorrectEyeData$Secend[i] > time[j] | CorrectEyeData$Thirdstart[i]
< time[j] & CorrectEyeData$Thirdend[i]>time[j], 1,0)
}
}
Combine the CleanEyeData with the time matrix
CleanData <- cbind(CorrectEyeData, data.frame(tmatrix))
Assign time values to time bin columns
colnames(CleanData)[23:263] <- seq(0, 6000, by=25)
Subset the dataset with only the necessary colums and remove anything in the memory that might be a memory hog
CleanData <- CleanData[, -c(2:8,10,11,12,13,14, 17:22)]
Put the data in long-format and then calculate the proportion for each time bin for each subject
CleanData<- CleanData %>% gather(time,value,5:245)
#find proportion for each interest area for each subject
CleanDataProb<-CleanData %>% group_by(Subject,IA_LABEL,time) %>%
summarise(Prob=mean(value))
CleanDataProb$time<-as.numeric(as.character(CleanDataProb$time))
The aim of this section is to explore pattern in the data without commiting to any formal modeling.
which Competitor is the stronger one?
#subset the required dataset by averaging over subjects; find the usual
OnlySubjects<-CleanData %>%
group_by(IA_LABEL, time) %>%
summarise(n=n(),Prob=mean(value),sd=sd(value)) %>%
mutate(se=sd/sqrt(n), LCI=Prob+qnorm(0.025)*se, UCI=Prob+qnorm(0.975)*se)
OnlySubjects$time<-as.numeric(as.character(OnlySubjects$time))
levels(OnlySubjects$IA_LABEL) <- list(Target="TARGET_IA ", `High Competitor`="CompetitorHigh ", `Low Competitor`="CompetitorLow ", `Distractor`="UNREL2_IA ")
colnames(OnlySubjects)[1]<-"Interest Area"
#plot for time 0 to time 6000
cbPalette <- c("#999999", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7")
g1<-ggplot(OnlySubjects, aes(x=time, y=Prob, color=`Interest Area`, shape=`Interest Area`)) +
geom_point(size=3) +
geom_errorbar(aes(ymin=LCI, ymax=UCI)) +
scale_x_continuous(breaks=seq(0,6000,500), limits=c(0,6000), name="time from trial onset (ms)")+
geom_vline(xintercept = 1825)+
geom_vline(xintercept = 3425)+theme_bw()+
scale_fill_manual(values=cbPalette)+scale_colour_manual(values=cbPalette)+
scale_y_continuous(name="Fixation Proportion")+
theme(legend.text = element_text(size = 12))+
theme(legend.title = element_text(size=12, face="bold"))+
theme(legend.title = element_text(size=12, face="bold"))+
theme(axis.title.x = element_text(face="bold", size=15), axis.text.x = element_text(size=10))+
theme(axis.title.y = element_text(face="bold", size=15), axis.text.y = element_text(size=10))+
theme(strip.text.x = element_text(size=12, face="bold"))+
theme(plot.title = element_text(lineheight=.8, face="bold", size=14))+theme_bw()
g1
ggsave(
"Plot1Exp2.png",
g1,
width = 9.25,
height = 6.25,
dpi = 300
)
OnlySubjects$time2<-OnlySubjects$time-1825
g2<-ggplot(OnlySubjects, aes(x=time2, y=Prob, color=`Interest Area`, shape=`Interest Area`)) +
geom_point(size=3) +
geom_errorbar(aes(ymin=LCI, ymax=UCI))+
scale_x_continuous(breaks=seq(0,1300,50), limits=c(0,1300), name="time from spoken target onset(ms)")+
theme_bw()+
scale_fill_manual(values=cbPalette)+scale_colour_manual(values=cbPalette)+
scale_y_continuous(name="Fixation Proportion")+
theme(legend.text = element_text(size = 12))+
theme(legend.title = element_text(size=12, face="bold"))+
theme(legend.title = element_text(size=12, face="bold"))+
theme(axis.title.x = element_text(face="bold", size=15), axis.text.x = element_text(size=12, angle=90, vjust=0.5))+
theme(axis.title.y = element_text(face="bold", size=15), axis.text.y = element_text(size=12))+
theme(plot.title = element_text(lineheight=.8, face="bold", size=14))+
theme(strip.text.x = element_text(size=12, face="bold"))
g2
#the plots shows some warnings because it removes values before display onset and after display onset
ggsave(
"Plot2Exp2.png",
g2,
width = 9.25,
height = 6.25,
dpi = 300
)
Create Graph for individual differences
#subset the required dataset by averaging over subjects; find the usual
# OnlySubjects<-CleanData %>%
# group_by(IA_LABEL, time, SubjectType) %>%
# summarise(n=n(),Prob=mean(value),sd=sd(value)) %>%
# mutate(se=sd/sqrt(n), LCI=Prob+qnorm(0.025)*se, UCI=Prob+qnorm(0.975)*se)
# OnlySubjects$time<-as.numeric(as.character(OnlySubjects$time))
#
# levels(OnlySubjects$IA_LABEL) <- list(Target="TARGET_IA ", `High Competitor`="CompetitorHigh ", `Low Competitor`="CompetitorLow ", `Distractor`="UNREL2_IA ")
#
# colnames(OnlySubjects)[1]<-"Interest Area"
# #plot for time 0 to time 6000
# cbPalette <- c("#999999", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7")
#
# OnlySubjects$time2<-OnlySubjects$time-1825
#
# individual<-ggplot(OnlySubjects, aes(x=time2, y=Prob, color=`Interest Area`, shape=`Interest Area`)) +
# geom_point(size=3) +
# geom_errorbar(aes(ymin=LCI, ymax=UCI))+
# scale_x_continuous(breaks=seq(0,1300,50), limits=c(0,1300), name="time from spoken target onset(ms)")+
# theme_bw()+
# scale_fill_manual(values=cbPalette)+scale_colour_manual(values=cbPalette)+
# scale_y_continuous(name="Fixation Proportion")+
# theme(legend.text = element_text(size = 12))+
# theme(legend.title = element_text(size=12, face="bold"))+
# theme(legend.title = element_text(size=12, face="bold"))+
# theme(axis.title.x = element_text(face="bold", size=15), axis.text.x = element_text(size=12, angle=90, vjust=0.5))+
# theme(axis.title.y = element_text(face="bold", size=15), axis.text.y = element_text(size=12))+
# theme(plot.title = element_text(lineheight=.8, face="bold", size=14))+
# theme(strip.text.x = element_text(size=12, face="bold"))+facet_wrap(~SubjectType)
#
# individual
#
# #the plots shows some warnings because it removes values before display onset and after display onset
#
# ggsave(
# "IndividualDifferences.png",
# individual,
# width = 9.25,
# height = 6.25,
# dpi = 300
# )
#
# Fast subjects seem to identify the target quicker, but otherwise there seem to be no differences in terms of how big the overlap effects shows up.
The following analyses will examine 1. the proportion fixations between the High and Low Competitor
First, the required subset is created:
model<-CleanDataProb %>%
filter(IA_LABEL=="CompetitorHigh " | IA_LABEL=="CompetitorLow ", time>=2025 & time <=3025)
model$IA_LABEL<-as.factor(as.character(model$IA_LABEL))
#Generate time polynomials up to quadratic poly.
t25 <- data.frame(poly(unique(model$time),4))
t25$time <- seq(2025, 3025, by=25)
#add polynomials to data frame
model <- merge(model , t25, by="time")
head(model)
## time Subject IA_LABEL Prob X1 X2 X3
## 1 2025 10spe2 CompetitorHigh 0.14814815 -0.2639818 0.3245611 -0.3568831
## 2 2025 10spe2 CompetitorLow 0.11111111 -0.2639818 0.3245611 -0.3568831
## 3 2025 11spe2 CompetitorLow 0.14814815 -0.2639818 0.3245611 -0.3568831
## 4 2025 1spe2 CompetitorHigh 0.11111111 -0.2639818 0.3245611 -0.3568831
## 5 2025 1sse2 CompetitorHigh 0.14814815 -0.2639818 0.3245611 -0.3568831
## 6 2025 2sre2 CompetitorLow 0.07407407 -0.2639818 0.3245611 -0.3568831
## X4
## 1 0.3669381
## 2 0.3669381
## 3 0.3669381
## 4 0.3669381
## 5 0.3669381
## 6 0.3669381
str(model)
## 'data.frame': 2952 obs. of 8 variables:
## $ time : num 2025 2025 2025 2025 2025 ...
## $ Subject : Factor w/ 36 levels "10spe2","11spe2",..: 1 1 2 3 4 5 6 7 7 8 ...
## $ IA_LABEL: Factor w/ 2 levels "CompetitorHigh ",..: 1 2 2 1 1 2 1 1 2 2 ...
## $ Prob : num 0.148 0.111 0.148 0.111 0.148 ...
## $ X1 : num -0.264 -0.264 -0.264 -0.264 -0.264 ...
## $ X2 : num 0.325 0.325 0.325 0.325 0.325 ...
## $ X3 : num -0.357 -0.357 -0.357 -0.357 -0.357 ...
## $ X4 : num 0.367 0.367 0.367 0.367 0.367 ...
The following models are run, with increasing complexity:
registerDoParallel(3)
#base model; Model1
Model1<-lmer(Prob ~ IA_LABEL +
(1| Subject) +
(1| Subject:IA_LABEL),
data=model, REML=T)
#linear; model 2
Model2<-lmer(Prob ~ IA_LABEL*X1 +
(1+X1| Subject) +
(1+X1| Subject:IA_LABEL),
data=model, REML=T)
#linea+quadratic; model 3
Model3<-lmer(Prob ~ IA_LABEL*(X1+X2) +
(1+X1+X2| Subject) +
(1+X1+X2| Subject:IA_LABEL),
data=model, REML=T)
#Model 4
Model4<-lmer(Prob ~ IA_LABEL*(X1+X2+X3) +
(1+X1+X2+X3| Subject) +
(1+X1+X2+X3| Subject:IA_LABEL),
data=model, REML=T)
#Model 5 failed to converge
Model5<-lmer(Prob ~ IA_LABEL*(X1+X2+X3+X4) +
(1+X1+X2+X3+X4| Subject) +
(1+X1+X2+X3+X4| Subject:IA_LABEL),
data=model, REML=T)
#compare the models based on ANOVA test
anova(Model1,Model2,Model3,Model4,Model5)
## Data: model
## Models:
## object: Prob ~ IA_LABEL + (1 | Subject) + (1 | Subject:IA_LABEL)
## ..1: Prob ~ IA_LABEL * X1 + (1 + X1 | Subject) + (1 + X1 | Subject:IA_LABEL)
## ..2: Prob ~ IA_LABEL * (X1 + X2) + (1 + X1 + X2 | Subject) + (1 +
## ..2: X1 + X2 | Subject:IA_LABEL)
## ..3: Prob ~ IA_LABEL * (X1 + X2 + X3) + (1 + X1 + X2 + X3 | Subject) +
## ..3: (1 + X1 + X2 + X3 | Subject:IA_LABEL)
## ..4: Prob ~ IA_LABEL * (X1 + X2 + X3 + X4) + (1 + X1 + X2 + X3 + X4 |
## ..4: Subject) + (1 + X1 + X2 + X3 + X4 | Subject:IA_LABEL)
## Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
## object 5 -6383.3 -6353.3 3196.6 -6393.3
## ..1 11 -7914.3 -7848.4 3968.2 -7936.3 1543.05 6 < 2.2e-16 ***
## ..2 19 -8964.0 -8850.2 4501.0 -9002.0 1065.71 8 < 2.2e-16 ***
## ..3 29 -9723.4 -9549.6 4890.7 -9781.4 779.32 10 < 2.2e-16 ***
## ..4 41 -10219.9 -9974.3 5151.0 -10301.9 520.59 12 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#compare the models based on R^2 based on a paper from Xu (http://onlinelibrary.wiley.com/doi/10.1002/sim.1572/abstract)
1-var(residuals(Model1))/(var(model.response(model.frame(Model1))))
## [1] 0.1417364
1-var(residuals(Model2))/(var(model.response(model.frame(Model2))))
## [1] 0.5492991
1-var(residuals(Model3))/(var(model.response(model.frame(Model3))))
## [1] 0.7185989
1-var(residuals(Model4))/(var(model.response(model.frame(Model4))))
## [1] 0.8057794
1-var(residuals(Model5))/(var(model.response(model.frame(Model5))))
## [1] 0.8502332
#find the summary for model 4
summary(Model5)
## Linear mixed model fit by REML t-tests use Satterthwaite approximations
## to degrees of freedom [lmerMod]
## Formula: Prob ~ IA_LABEL * (X1 + X2 + X3 + X4) + (1 + X1 + X2 + X3 + X4 |
## Subject) + (1 + X1 + X2 + X3 + X4 | Subject:IA_LABEL)
## Data: model
##
## REML criterion at convergence: -10240.4
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.9014 -0.5771 0.0169 0.6106 4.6871
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## Subject:IA_LABEL (Intercept) 0.0006814 0.02610
## X1 0.0658951 0.25670 -0.24
## X2 0.0293725 0.17138 -0.28 -0.07
## X3 0.0226513 0.15050 0.22 -0.61 -0.05
## X4 0.0122534 0.11069 0.04 0.02 -0.80 0.05
## Subject (Intercept) 0.0003294 0.01815
## X1 0.0281716 0.16784 0.14
## X2 0.0017371 0.04168 0.17 -0.51
## X3 0.0012596 0.03549 -0.60 -0.67 -0.22
## X4 0.0004017 0.02004 0.22 -0.02 0.87 -0.62
## Residual 0.0012364 0.03516
## Number of obs: 2952, groups: Subject:IA_LABEL, 72; Subject, 36
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 0.144326 0.005377 63.630000 26.839 < 2e-16
## IA_LABELCompetitorLow -0.014456 0.006287 36.180000 -2.299 0.0274
## X1 -0.134215 0.051452 64.380000 -2.609 0.0113
## X2 -0.152413 0.029975 69.810000 -5.085 2.96e-06
## X3 0.047888 0.026430 69.830000 1.812 0.0743
## X4 0.002626 0.019644 69.950000 0.134 0.8940
## IA_LABELCompetitorLow :X1 -0.066810 0.061070 38.800000 -1.094 0.2807
## IA_LABELCompetitorLow :X2 0.027994 0.041237 61.880000 0.679 0.4998
## IA_LABELCompetitorLow :X3 -0.013211 0.036429 65.580000 -0.363 0.7180
## IA_LABELCompetitorLow :X4 0.004272 0.027376 63.700000 0.156 0.8765
##
## (Intercept) ***
## IA_LABELCompetitorLow *
## X1 *
## X2 ***
## X3 .
## X4
## IA_LABELCompetitorLow :X1
## IA_LABELCompetitorLow :X2
## IA_LABELCompetitorLow :X3
## IA_LABELCompetitorLow :X4
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) IA_LAB X1 X2 X3 X4 IA_LABELCL:X1
## IA_LABELCmL -0.585
## X1 -0.120 0.139
## X2 -0.194 0.185 -0.120
## X3 0.095 -0.146 -0.564 -0.061
## X4 0.048 -0.023 0.010 -0.686 0.021
## IA_LABELCL:X1 0.137 -0.234 -0.593 0.047 0.406 -0.011
## IA_LABELCL:X2 0.157 -0.269 0.040 -0.688 0.036 0.523 -0.068
## IA_LABELCL:X3 -0.124 0.212 0.350 0.036 -0.689 -0.032 -0.590
## IA_LABELCL:X4 -0.019 0.033 -0.009 0.517 -0.032 -0.697 0.015
## IA_LABELCL:X2 IA_LABELCL:X3
## IA_LABELCmL
## X1
## X2
## X3
## X4
## IA_LABELCL:X1
## IA_LABELCL:X2
## IA_LABELCL:X3 -0.052
## IA_LABELCL:X4 -0.751 0.046
#find the RMSE
sqrt(mean((model$Prob-fitted(Model5))^2))
## [1] 0.03313396
#attach the fitted values to the dataset model, so that they can be used later for graphing purposes
model$Model1Fitted<-fitted(Model1)
model$Model2Fitted<-fitted(Model2)
model$Model3Fitted<-fitted(Model3)
model$Model4Fitted<-fitted(Model4)
model$Model5Fitted<-fitted(Model5)
#check that the values are in the dataset
str(model)
## 'data.frame': 2952 obs. of 13 variables:
## $ time : num 2025 2025 2025 2025 2025 ...
## $ Subject : Factor w/ 36 levels "10spe2","11spe2",..: 1 1 2 3 4 5 6 7 7 8 ...
## $ IA_LABEL : Factor w/ 2 levels "CompetitorHigh ",..: 1 2 2 1 1 2 1 1 2 2 ...
## $ Prob : num 0.148 0.111 0.148 0.111 0.148 ...
## $ X1 : num -0.264 -0.264 -0.264 -0.264 -0.264 ...
## $ X2 : num 0.325 0.325 0.325 0.325 0.325 ...
## $ X3 : num -0.357 -0.357 -0.357 -0.357 -0.357 ...
## $ X4 : num 0.367 0.367 0.367 0.367 0.367 ...
## $ Model1Fitted: num 0.147 0.154 0.124 0.152 0.117 ...
## $ Model2Fitted: num 0.173 0.206 0.171 0.104 0.167 ...
## $ Model3Fitted: num 0.1926 0.0575 0.1382 0.0534 0.1276 ...
## $ Model4Fitted: num 0.1187 0.0457 0.1472 0.1246 0.1575 ...
## $ Model5Fitted: num 0.12 0.158 0.151 0.108 0.177 ...
Model 5 performs the best, though LMER is not very susceptible to corss-validation, so the addition of more polynomials might be leading to overfitting
Each model used above will be graphed by overlaying it on the real data. First, the plots for the first comparison will be created. For this, I use the model dataset. I put the data in long only format.
ModelLong<- model %>% gather(Model,Predictions,9:13)
#rename the models so that they are more clear
levels(ModelLong$Model) <- list(`Model 1 Intercept`="Model1Fitted", `Model 2 Linear`="Model2Fitted", `Model 3 Quadratic`="Model3Fitted", `Model 4 Cubic`="Model4Fitted", `Model 5 Quartic`="Model5Fitted")
#rename some of the colums
colnames(ModelLong)[3]<-"Interest Area"
ModelLong[,3]<-factor(ModelLong[,3])
#compress the data(find the average over subjects)
ForGraph1<-ModelLong %>% group_by(`Interest Area`,time, Model) %>%
summarise(`Fixation Probability`=mean(Prob),Predicted=mean(Predictions))
levels(ForGraph1$`Interest Area`) <- list(`High Competitor`="CompetitorHigh ", `Low Competitor`="CompetitorLow ")
The code chunk below will plot and save the graph
#color palette for color-blind
cbPalette <- c("#999999", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7")
#create plot1 for Experiment 1, which goes in the publication
g3<-ggplot(ForGraph1, aes(x=time-2025, y=`Fixation Probability`, shape=`Interest Area`, color=`Interest Area`)) +
geom_point(size=3) + geom_line(aes(y=Predicted), size=1)+facet_wrap(~Model, nrow=3, ncol=3)+
scale_fill_manual(values=cbPalette)+scale_colour_manual(values=cbPalette)+
ggtitle("Fixation Proportion for Models\n examining High vs. Low Competitors")+
theme_bw()+
scale_x_continuous(breaks=seq(0,1000,100), limits=c(0,1000), name="time 200ms after spoken word onset(ms)")+
theme(legend.text = element_text(size = 12))+
theme(legend.title = element_text(size=12, face="bold"))+
theme(axis.title.x = element_text(face="bold", size=15), axis.text.x= element_text(size=12, angle=90, vjust=0.5))+
theme(axis.title.y = element_text(face="bold", size=15), axis.text.y = element_text(size=12))+
theme(strip.text.x = element_text(size=12, face="bold"))+
theme(plot.title = element_text(lineheight=.8, face="bold", size=14))
g3
ggsave(
"Plot3Exp2.png",
g3,
width = 9.25,
height = 6.25,
dpi = 300
)
Session Info
sessionInfo()
## R version 3.2.3 (2015-12-10)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 7 x64 (build 7601) Service Pack 1
##
## locale:
## [1] LC_COLLATE=English_United States.1252
## [2] LC_CTYPE=English_United States.1252
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.1252
##
## attached base packages:
## [1] parallel stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] caret_6.0-64 lattice_0.20-33 gridExtra_2.2.1
## [4] tidyr_0.4.1 doParallel_1.0.10 iterators_1.0.8
## [7] foreach_1.4.3 dplyr_0.4.3 lmerTest_2.0-30
## [10] lme4_1.1-11 Matrix_1.2-4 ggplot2_2.1.0
##
## loaded via a namespace (and not attached):
## [1] reshape2_1.4.1 splines_3.2.3 colorspace_1.2-6
## [4] htmltools_0.3 stats4_3.2.3 yaml_2.1.13
## [7] mgcv_1.8-12 survival_2.38-3 nloptr_1.0.4
## [10] foreign_0.8-66 DBI_0.3.1 RColorBrewer_1.1-2
## [13] plyr_1.8.3 stringr_1.0.0 MatrixModels_0.4-1
## [16] munsell_0.4.3 gtable_0.2.0 codetools_0.2-14
## [19] evaluate_0.8.3 labeling_0.3 latticeExtra_0.6-28
## [22] knitr_1.12.3 SparseM_1.7 quantreg_5.21
## [25] pbkrtest_0.4-6 Rcpp_0.12.4 acepack_1.3-3.3
## [28] scales_0.4.0 formatR_1.3 Hmisc_3.17-2
## [31] digest_0.6.9 stringi_1.0-1 grid_3.2.3
## [34] tools_3.2.3 magrittr_1.5 lazyeval_0.1.10
## [37] Formula_1.2-1 cluster_2.0.3 car_2.1-1
## [40] MASS_7.3-45 assertthat_0.1 minqa_1.2.4
## [43] rmarkdown_0.9.5 R6_2.1.2 rpart_4.1-10
## [46] nnet_7.3-12 nlme_3.1-126