The aim of this document is to provide all the analyses for Experiment 1 Overlap all 3. In this experiment, participants were presented with displays of 4 words, among which were the target, a high phonological competitor, a low phonological competitor, and a distractor. The structure of the document will be as follows:
Data Cleaning and Pre-processing
Exploratory Analyses
Growth Curve Analysis
Graphs
Load the required libraries
library(ggplot2)
library(lme4)
library(lmerTest)
library(dplyr)
library(parallel)
library(doParallel)
library(tidyr)
library(gridExtra)
library(caret)
First, I load the data file indicating the trials that were removed for each subject.
deletedTrials<-read.csv("delete.csv")
Now I load up the csv files with Exp1 data. There are two files (Exp1Part1 and Exp1Part2). Not all colums in those files are required. Only the following colums will be loaded in R, so as to preserve memory space. The colums are: RECORDING_SESSION_LABEL, IA_FIRST_RUN_START_TIME, IA_FIRST_RUN_START_TIME, IA_FIRST_RUN_END_TIME, IA_SECOND_RUN_START_TIME, IA_SECOND_RUN_END_TIME, IA_THIRD_RUN_START_TIME, IA_THIRD_RUN_END_TIME, IA_LABEL, RESPONSE_ACC, IA_DWELL_TIME, TRIAL_INDEX, target, RESPONSE_RT,trialtype,target, list, oldnewitem
Part1 <- read.csv("Exp1Part1.csv", na.strings = c("."))[,c('RECORDING_SESSION_LABEL', 'IA_FIRST_RUN_START_TIME', 'IA_FIRST_RUN_START_TIME', 'IA_FIRST_RUN_END_TIME', 'IA_SECOND_RUN_START_TIME', 'IA_SECOND_RUN_END_TIME', 'IA_THIRD_RUN_START_TIME', 'IA_THIRD_RUN_END_TIME', 'IA_LABEL', 'RESPONSE_ACC', 'IA_DWELL_TIME', 'TRIAL_INDEX', 'target', 'RESPONSE_RT', 'trialtype','oldnewitem', 'MOUSE_TIME','DISPLAY_TIME')]
#now load up part 2
Part2 <- read.csv("Exp1Part2.csv", na.strings = c("."))[,c('RECORDING_SESSION_LABEL', 'IA_FIRST_RUN_START_TIME', 'IA_FIRST_RUN_START_TIME', 'IA_FIRST_RUN_END_TIME', 'IA_SECOND_RUN_START_TIME', 'IA_SECOND_RUN_END_TIME', 'IA_THIRD_RUN_START_TIME', 'IA_THIRD_RUN_END_TIME', 'IA_LABEL', 'RESPONSE_ACC', 'IA_DWELL_TIME', 'TRIAL_INDEX', 'target', 'RESPONSE_RT', 'trialtype','oldnewitem', 'MOUSE_TIME','DISPLAY_TIME')]
#combine part 1 and part 2 into one dataframe
EyeData<-rbind(Part1, Part2)
Right now we have a dataset with all the subjects that took part in Experiment 1. In the code below, I perform some preprocessing that aims to do the following: Remove bad trials (e.g., trials with a lot of off-screen fixations or trials in which the fixations don’t start at the center of the screen or trials in which the tracking of the eye was not very good. The decision about which trials should be removed was done by two graduate students, Stas and Julie.) The deleted trials can be found in the deletedTrials file.
#this part of the code is not very effective; need to figure out later how to optimize it.
EyeData<-EyeData[!c(EyeData$RECORDING_SESSION_LABEL=="1sse1" & EyeData$TRIAL_INDEX==50),]
EyeData<-EyeData[!c(EyeData$RECORDING_SESSION_LABEL=="12spe1" & EyeData$TRIAL_INDEX==73),]
EyeData<-EyeData[!c(EyeData$RECORDING_SESSION_LABEL=="7spe1" & EyeData$TRIAL_INDEX==38),]
EyeData<-EyeData[!c(EyeData$RECORDING_SESSION_LABEL=="4tle1" & EyeData$TRIAL_INDEX==53),]
EyeData<-EyeData[!c(EyeData$RECORDING_SESSION_LABEL=="9spe1" & EyeData$TRIAL_INDEX==87),]
EyeData<-EyeData[!c(EyeData$RECORDING_SESSION_LABEL=="11spe2" & EyeData$TRIAL_INDEX==11),]
24/10880
## [1] 0.002205882
Only 6 trials have been deleted, which constitutes less than .22% of the total number of trials.
In the code below, I transform the NAs to zero, calculate the RT from the onset of the word display till the mouse click time is made.
#change all the NAs to zeros
EyeData[is.na(EyeData)] <- 0
#Calculate RT from the display onset until the mouse click is made
EyeData$RESPONSE_RT<-EyeData$MOUSE_TIME-EyeData$DISPLAY_TIME
#check the dimentions of the dataset
dim(EyeData)
## [1] 10880 18
#remove the MOUSE_TIME and DISPLAY_TIME variables
EyeData<-EyeData %>% select (1:16)
#rename the RECORDING_SESSION_LABEL into Subject
colnames(EyeData)[1] <- "Subject"
#examine colum names
names(EyeData)
## [1] "Subject" "IA_FIRST_RUN_START_TIME"
## [3] "IA_FIRST_RUN_START_TIME.1" "IA_FIRST_RUN_END_TIME"
## [5] "IA_SECOND_RUN_START_TIME" "IA_SECOND_RUN_END_TIME"
## [7] "IA_THIRD_RUN_START_TIME" "IA_THIRD_RUN_END_TIME"
## [9] "IA_LABEL" "RESPONSE_ACC"
## [11] "IA_DWELL_TIME" "TRIAL_INDEX"
## [13] "target" "RESPONSE_RT"
## [15] "trialtype" "oldnewitem"
Examine the structure of the dataframe
str(EyeData)
## 'data.frame': 10880 obs. of 16 variables:
## $ Subject : Factor w/ 29 levels "10spe1","11spe2",..: 16 16 16 16 16 16 16 16 16 16 ...
## $ IA_FIRST_RUN_START_TIME : num 2788 0 2350 2548 2126 ...
## $ IA_FIRST_RUN_START_TIME.1: num 2788 0 2350 2548 2126 ...
## $ IA_FIRST_RUN_END_TIME : num 3912 0 2495 2726 3096 ...
## $ IA_SECOND_RUN_START_TIME : num 0 0 0 0 0 ...
## $ IA_SECOND_RUN_END_TIME : num 0 0 0 0 0 ...
## $ IA_THIRD_RUN_START_TIME : num 0 0 0 0 0 0 0 0 0 0 ...
## $ IA_THIRD_RUN_END_TIME : num 0 0 0 0 0 0 0 0 0 0 ...
## $ IA_LABEL : Factor w/ 9 levels "CompetitorHigh ",..: 3 1 2 4 3 1 2 4 3 1 ...
## $ RESPONSE_ACC : int 1 1 1 1 1 1 1 1 1 1 ...
## $ IA_DWELL_TIME : int 1115 0 146 179 971 0 0 0 614 281 ...
## $ TRIAL_INDEX : int 1 1 1 1 2 2 2 2 3 3 ...
## $ target : Factor w/ 94 levels "bald","bang",..: 39 39 39 39 31 31 31 31 16 16 ...
## $ RESPONSE_RT : num 2204 2204 2204 2204 1380 ...
## $ trialtype : Factor w/ 4 levels "Exp","FillerRel",..: 4 4 4 4 4 4 4 4 4 4 ...
## $ oldnewitem : Factor w/ 3 levels "New","Old","x": 3 3 3 3 3 3 3 3 3 3 ...
Subset the experimental trials. This will remove all other trialtypes, such as FillerRel, FillerUnrel and Practice trials
EyeData <- EyeData %>% filter(trialtype=="Exp")
Check subject accuracy
SubjectAccuracy<- EyeData %>% group_by(Subject) %>%
summarise(MeanAccuracy=mean(RESPONSE_ACC)) %>%
arrange(MeanAccuracy)
SubjectAccuracy
## Source: local data frame [29 x 2]
##
## Subject MeanAccuracy
## (fctr) (dbl)
## 1 1spe1 0.9666667
## 2 1tle1 0.9666667
## 3 2sre1 0.9666667
## 4 4sre1 0.9666667
## 5 8spe1 0.9666667
## 6 10spe1 1.0000000
## 7 11spe2 1.0000000
## 8 1sse1 1.0000000
## 9 2spe1 1.0000000
## 10 2sse1 1.0000000
## .. ... ...
mean(SubjectAccuracy$MeanAccuracy)
## [1] 0.9942529
Subject accuracy is very high. Allmost all the subjects have 100% accuracy. Only 5 participants have a few incorrect trials.
Remove trials with incorrect accuracy
CorrectEyeData<-EyeData %>% filter(RESPONSE_ACC==1)
#number of rows deleted
nrow(EyeData)-nrow(CorrectEyeData) #20 rows get deleted, which corresponds to 5 incorrect trials
## [1] 20
20/3468 #thats about .5767% of experimental trials
## [1] 0.005767013
Set up the start time and the end time dummy-coding of each run in the interest area. This creates colums where if a fixation exists during a run, then the run gets 1 (exists) or if there is no fixation in the IA during a run, then the coding is 0 (not exist)
CorrectEyeData$Fststart <- ifelse(as.numeric(as.character(CorrectEyeData$IA_FIRST_RUN_START_TIME)) > 0, as.numeric(as.character(CorrectEyeData$IA_FIRST_RUN_START_TIME)), 0)
CorrectEyeData$Fstend <- ifelse(as.numeric(as.character(CorrectEyeData$IA_FIRST_RUN_START_TIME)) > 0, as.numeric(as.character(CorrectEyeData$IA_FIRST_RUN_END_TIME)), 0)
CorrectEyeData$Secstart <- ifelse(as.numeric(as.character(CorrectEyeData$IA_SECOND_RUN_START_TIME)) > 0, as.numeric(as.character(CorrectEyeData$IA_SECOND_RUN_START_TIME)), 0)
CorrectEyeData$Secend <- ifelse(as.numeric(as.character(CorrectEyeData$IA_SECOND_RUN_START_TIME)) > 0, as.numeric(as.character(CorrectEyeData$IA_SECOND_RUN_END_TIME)), 0)
CorrectEyeData$Thirdstart <- ifelse(as.numeric(as.character(CorrectEyeData$IA_THIRD_RUN_START_TIME)) > 0, as.numeric(as.character(CorrectEyeData$IA_THIRD_RUN_START_TIME)), 0)
CorrectEyeData$Thirdend <- ifelse(as.numeric(as.character(CorrectEyeData$IA_THIRD_RUN_START_TIME)) > 0, as.numeric(as.character(CorrectEyeData$IA_THIRD_RUN_END_TIME)), 0)
Generate time bins from time 0 to time 6000 in 25 ms bins and assign them to the dataset
time <- seq(0, 6000, by=25)
tmatrix <- matrix(nrow=nrow(CorrectEyeData), ncol=length(time))
dim(tmatrix)
## [1] 3448 241
Generate time vectors for each row and column for first, second, and third pass viewings so that viewing periods receive a viewing probability value of 1
registerDoParallel(3)
for(i in 1:nrow(tmatrix)) {
for(j in 1:length(time)) {
tmatrix[i,j] <- ifelse(CorrectEyeData$Fststart[i] < time[j] &
CorrectEyeData$Fstend[i] > time[j] |CorrectEyeData$Secstart[i] <
time[j] & CorrectEyeData$Secend[i] > time[j] | CorrectEyeData$Thirdstart[i]
< time[j] & CorrectEyeData$Thirdend[i]>time[j], 1,0)
}
}
Combine the CleanEyeData with the time matrix
CleanData <- cbind(CorrectEyeData, data.frame(tmatrix))
Assign time values to time bin columns
colnames(CleanData)[23:263] <- seq(0, 6000, by=25)
Subset the dataset with only the necessary colums and remove anything in the memory that might be a memory hog
CleanData <- CleanData[, -c(2:8,10,11,12,13,14, 17:22)]
Put the data in long-format and then calculate the proportion for each time bin for each subject
CleanData<- CleanData %>% gather(time,value,5:245)
#find proportion for each interest area for each subject
CleanDataProb<-CleanData %>% group_by(Subject,IA_LABEL,time) %>%
summarise(Prob=mean(value))
CleanDataProb$time<-as.numeric(as.character(CleanDataProb$time))
The aim of this section is to explore pattern in the data without commiting to any formal modeling.
which Competitor is the stronger one?
#subset the required dataset by averaging over subjects; find the usual
OnlySubjects<-CleanData %>%
group_by(IA_LABEL, time) %>%
summarise(n=n(),Prob=mean(value),sd=sd(value)) %>%
mutate(se=sd/sqrt(n), LCI=Prob+qnorm(0.025)*se, UCI=Prob+qnorm(0.975)*se)
OnlySubjects$time<-as.numeric(as.character(OnlySubjects$time))
levels(OnlySubjects$IA_LABEL) <- list(Target="TARGET_IA ", `High Competitor`="CompetitorHigh ", `Low Competitor`="CompetitorLow ", `Distractor`="UNREL2_IA ")
colnames(OnlySubjects)[1]<-"Interest Area"
#plot for time 0 to time 6000
cbPalette <- c("#999999", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7")
g1<-ggplot(OnlySubjects, aes(x=time, y=Prob, color=`Interest Area`, shape=`Interest Area`)) +
geom_point(size=3) +
geom_errorbar(aes(ymin=LCI, ymax=UCI)) +
scale_x_continuous(breaks=seq(0,6000,500), limits=c(0,6000), name="time from trial onset (ms)")+
geom_vline(xintercept = 1825)+
geom_vline(xintercept = 3425)+theme_bw()+
scale_fill_manual(values=cbPalette)+scale_colour_manual(values=cbPalette)+
scale_y_continuous(name="Fixation Proportion")+
theme(legend.text = element_text(size = 12))+
theme(legend.title = element_text(size=12, face="bold"))+
theme(legend.title = element_text(size=12, face="bold"))+
theme(axis.title.x = element_text(face="bold", size=15), axis.text.x = element_text(size=10))+
theme(axis.title.y = element_text(face="bold", size=15), axis.text.y = element_text(size=10))+
theme(strip.text.x = element_text(size=12, face="bold"))+
theme(plot.title = element_text(lineheight=.8, face="bold", size=14))+theme_bw()
g1
ggsave(
"Plot1Exp1.png",
g1,
width = 9.25,
height = 6.25,
dpi = 300
)
OnlySubjects$time2<-OnlySubjects$time-1825
g2<-ggplot(OnlySubjects, aes(x=time2, y=Prob, color=`Interest Area`, shape=`Interest Area`)) +
geom_point(size=3) +
geom_errorbar(aes(ymin=LCI, ymax=UCI))+
scale_x_continuous(breaks=seq(0,1500,50), limits=c(0,1500), name="time from spoken target onset(ms)")+
theme_bw()+
scale_fill_manual(values=cbPalette)+scale_colour_manual(values=cbPalette)+
scale_y_continuous(name="Fixation Proportion")+
theme(legend.text = element_text(size = 12))+
theme(legend.title = element_text(size=12, face="bold"))+
theme(legend.title = element_text(size=12, face="bold"))+
theme(axis.title.x = element_text(face="bold", size=15), axis.text.x = element_text(size=12, angle=90, vjust=0.5))+
theme(axis.title.y = element_text(face="bold", size=15), axis.text.y = element_text(size=12))+
theme(plot.title = element_text(lineheight=.8, face="bold", size=14))+
theme(strip.text.x = element_text(size=12, face="bold"))
g2
#the plots shows some warnings because it removes values before display onset and after display onset
ggsave(
"Plot2Exp1.png",
g2,
width = 9.25,
height = 6.25,
dpi = 300
)
The following analyses will examine 1. the proportion fixations between the High and Low Competitor
First, the required subset is created:
model<-CleanDataProb %>%
filter(IA_LABEL=="CompetitorHigh " | IA_LABEL=="CompetitorLow ", time>=2025 & time <=3025)
model$IA_LABEL<-as.factor(as.character(model$IA_LABEL))
#Generate time polynomials up to quadratic poly.
t25 <- data.frame(poly(unique(model$time),4))
t25$time <- seq(2025, 3025, by=25)
#add polynomials to data frame
model <- merge(model , t25, by="time")
head(model)
## time Subject IA_LABEL Prob X1 X2 X3
## 1 2025 10spe1 CompetitorHigh 0.10000000 -0.2639818 0.3245611 -0.3568831
## 2 2025 10spe1 CompetitorLow 0.03333333 -0.2639818 0.3245611 -0.3568831
## 3 2025 11spe2 CompetitorLow 0.10344828 -0.2639818 0.3245611 -0.3568831
## 4 2025 1spe1 CompetitorHigh 0.03448276 -0.2639818 0.3245611 -0.3568831
## 5 2025 1sse1 CompetitorHigh 0.30000000 -0.2639818 0.3245611 -0.3568831
## 6 2025 1tle1 CompetitorLow 0.06896552 -0.2639818 0.3245611 -0.3568831
## X4
## 1 0.3669381
## 2 0.3669381
## 3 0.3669381
## 4 0.3669381
## 5 0.3669381
## 6 0.3669381
str(model)
## 'data.frame': 2378 obs. of 8 variables:
## $ time : num 2025 2025 2025 2025 2025 ...
## $ Subject : Factor w/ 29 levels "10spe1","11spe2",..: 1 1 2 3 4 5 6 7 7 8 ...
## $ IA_LABEL: Factor w/ 2 levels "CompetitorHigh ",..: 1 2 2 1 1 2 1 1 2 2 ...
## $ Prob : num 0.1 0.0333 0.1034 0.0345 0.3 ...
## $ X1 : num -0.264 -0.264 -0.264 -0.264 -0.264 ...
## $ X2 : num 0.325 0.325 0.325 0.325 0.325 ...
## $ X3 : num -0.357 -0.357 -0.357 -0.357 -0.357 ...
## $ X4 : num 0.367 0.367 0.367 0.367 0.367 ...
The following models are run, with increasing complexity:
registerDoParallel(3)
#base model; Model1
Model1<-lmer(Prob ~ IA_LABEL +
(1| Subject) +
(1| Subject:IA_LABEL),
data=model, REML=T)
#linear; model 2
Model2<-lmer(Prob ~ IA_LABEL*X1 +
(1+X1| Subject) +
(1+X1| Subject:IA_LABEL),
data=model, REML=T)
#linea+quadratic; model 3
Model3<-lmer(Prob ~ IA_LABEL*(X1+X2) +
(1+X1+X2| Subject) +
(1+X1+X2| Subject:IA_LABEL),
data=model, REML=T)
#Model 4
Model4<-lmer(Prob ~ IA_LABEL*(X1+X2+X3) +
(1+X1+X2+X3| Subject) +
(1+X1+X2+X3| Subject:IA_LABEL),
data=model, REML=T)
#Model 5 failed to converge
Model5<-lmer(Prob ~ IA_LABEL*(X1+X2+X3+X4) +
(1+X1+X2+X3+X4| Subject) +
(1+X1+X2+X3+X4| Subject:IA_LABEL),
data=model, REML=T)
#Model 6 converged
Model6<-lmer(Prob ~ IA_LABEL*(X1+X2+X3+X4) +
(1+X1+X2+X3| Subject) +
(1+X1+X2+X3+X4| Subject:IA_LABEL),
data=model, REML=T)
#Model 7 converged
Model7<-lmer(Prob ~ IA_LABEL*(X1+X2+X3+X4) +
(1+X1+X2+X3+X4| Subject) +
(1+X1+X2+X3| Subject:IA_LABEL),
data=model, REML=T)
library(beepr)
beep(8)
#compare the models based on ANOVA test
anova(Model1,Model2,Model3,Model4,Model5,Model6,Model7)
## Data: model
## Models:
## object: Prob ~ IA_LABEL + (1 | Subject) + (1 | Subject:IA_LABEL)
## ..1: Prob ~ IA_LABEL * X1 + (1 + X1 | Subject) + (1 + X1 | Subject:IA_LABEL)
## ..2: Prob ~ IA_LABEL * (X1 + X2) + (1 + X1 + X2 | Subject) + (1 +
## ..2: X1 + X2 | Subject:IA_LABEL)
## ..3: Prob ~ IA_LABEL * (X1 + X2 + X3) + (1 + X1 + X2 + X3 | Subject) +
## ..3: (1 + X1 + X2 + X3 | Subject:IA_LABEL)
## ..5: Prob ~ IA_LABEL * (X1 + X2 + X3 + X4) + (1 + X1 + X2 + X3 | Subject) +
## ..5: (1 + X1 + X2 + X3 + X4 | Subject:IA_LABEL)
## ..6: Prob ~ IA_LABEL * (X1 + X2 + X3 + X4) + (1 + X1 + X2 + X3 + X4 |
## ..6: Subject) + (1 + X1 + X2 + X3 | Subject:IA_LABEL)
## ..4: Prob ~ IA_LABEL * (X1 + X2 + X3 + X4) + (1 + X1 + X2 + X3 + X4 |
## ..4: Subject) + (1 + X1 + X2 + X3 + X4 | Subject:IA_LABEL)
## Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
## object 5 -4870.7 -4841.8 2440.4 -4880.7
## ..1 11 -5932.2 -5868.6 2977.1 -5954.2 1073.44 6 <2e-16 ***
## ..2 19 -7157.9 -7048.2 3598.0 -7195.9 1241.77 8 <2e-16 ***
## ..3 29 -7823.7 -7656.3 3940.9 -7881.7 685.80 10 <2e-16 ***
## ..5 36 -8215.0 -8007.2 4143.5 -8287.0 405.33 7 <2e-16 ***
## ..6 36 -8083.3 -7875.4 4077.6 -8155.3 0.00 0 1
## ..4 41 -8209.8 -7973.1 4145.9 -8291.8 136.51 5 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#compare the models based on R^2 based on a paper from Xu (http://onlinelibrary.wiley.com/doi/10.1002/sim.1572/abstract)
1-var(residuals(Model1))/(var(model.response(model.frame(Model1))))
## [1] 0.09853359
1-var(residuals(Model2))/(var(model.response(model.frame(Model2))))
## [1] 0.4825567
1-var(residuals(Model3))/(var(model.response(model.frame(Model3))))
## [1] 0.7260332
1-var(residuals(Model4))/(var(model.response(model.frame(Model4))))
## [1] 0.8161727
1-var(residuals(Model5))/(var(model.response(model.frame(Model5))))
## [1] 0.8589269
1-var(residuals(Model6))/(var(model.response(model.frame(Model6))))
## [1] 0.858976
1-var(residuals(Model7))/(var(model.response(model.frame(Model7))))
## [1] 0.8443338
#find the summary for model 4
summary(Model4)
## Linear mixed model fit by REML t-tests use Satterthwaite approximations
## to degrees of freedom [lmerMod]
## Formula:
## Prob ~ IA_LABEL * (X1 + X2 + X3) + (1 + X1 + X2 + X3 | Subject) +
## (1 + X1 + X2 + X3 | Subject:IA_LABEL)
## Data: model
##
## REML criterion at convergence: -7834
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.4260 -0.6105 -0.0173 0.6013 3.1867
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## Subject:IA_LABEL (Intercept) 0.0004802 0.02191
## X1 0.0365941 0.19130 0.23
## X2 0.0348028 0.18655 -0.51 -0.39
## X3 0.0196494 0.14018 0.03 -0.37 -0.12
## Subject (Intercept) 0.0002257 0.01502
## X1 0.0348023 0.18655 -0.17
## X2 0.0067362 0.08207 0.15 -1.00
## X3 0.0000819 0.00905 -0.23 -0.92 0.93
## Residual 0.0016023 0.04003
## Number of obs: 2378, groups: Subject:IA_LABEL, 58; Subject, 29
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 0.148103 0.005068 51.290000 29.221 < 2e-16
## IA_LABELCompetitorLow -0.020558 0.005984 29.710000 -3.435 0.00177
## X1 -0.256111 0.050172 45.610000 -5.105 6.30e-06
## X2 -0.252285 0.038570 54.650000 -6.541 2.18e-08
## X3 0.123652 0.027123 56.000000 4.559 2.84e-05
## IA_LABELCompetitorLow :X1 0.053181 0.051325 29.630000 1.036 0.30851
## IA_LABELCompetitorLow :X2 0.150451 0.050107 44.550000 3.003 0.00438
## IA_LABELCompetitorLow :X3 -0.081067 0.038284 55.810000 -2.118 0.03868
##
## (Intercept) ***
## IA_LABELCompetitorLow **
## X1 ***
## X2 ***
## X3 ***
## IA_LABELCompetitorLow :X1
## IA_LABELCompetitorLow :X2 **
## IA_LABELCompetitorLow :X3 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) IA_LAB X1 X2 X3 IA_LABELCL:X1
## IA_LABELCmL -0.590
## X1 0.068 -0.113
## X2 -0.337 0.313 -0.519
## X3 0.014 -0.018 -0.291 -0.079
## IA_LABELCL:X1 -0.130 0.221 -0.511 0.241 0.246
## IA_LABELCL:X2 0.284 -0.482 0.189 -0.650 0.078 -0.370
## IA_LABELCL:X3 -0.015 0.026 0.179 0.072 -0.706 -0.349
## IA_LABELCL:X2
## IA_LABELCmL
## X1
## X2
## X3
## IA_LABELCL:X1
## IA_LABELCL:X2
## IA_LABELCL:X3 -0.111
#find summary for model 7
summary(Model7)
## Linear mixed model fit by REML t-tests use Satterthwaite approximations
## to degrees of freedom [lmerMod]
## Formula: Prob ~ IA_LABEL * (X1 + X2 + X3 + X4) + (1 + X1 + X2 + X3 + X4 |
## Subject) + (1 + X1 + X2 + X3 | Subject:IA_LABEL)
## Data: model
##
## REML criterion at convergence: -8093.3
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.7107 -0.5966 0.0021 0.6058 3.3543
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## Subject:IA_LABEL (Intercept) 0.0004744 0.02178
## X1 0.0367525 0.19171 0.22
## X2 0.0316077 0.17779 -0.49 -0.37
## X3 0.0198548 0.14091 0.03 -0.37 -0.13
## Subject (Intercept) 0.0002370 0.01540
## X1 0.0348712 0.18674 -0.15
## X2 0.0101589 0.10079 -0.01 -0.87
## X3 0.0001041 0.01020 -0.31 -0.76 0.95
## X4 0.0065839 0.08114 0.15 0.58 -0.91 -0.92
## Residual 0.0013747 0.03708
## Number of obs: 2378, groups: Subject:IA_LABEL, 58; Subject, 29
##
## Fixed effects:
## Estimate Std. Error df t value
## (Intercept) 1.481e-01 5.068e-03 5.090e+01 29.221
## IA_LABELCompetitorLow -2.056e-02 5.919e-03 3.110e+01 -3.473
## X1 -2.561e-01 5.017e-02 4.560e+01 -5.105
## X2 -2.523e-01 3.857e-02 5.310e+01 -6.541
## X3 1.237e-01 2.712e-02 5.600e+01 4.559
## X4 6.378e-02 1.657e-02 3.350e+01 3.850
## IA_LABELCompetitorLow :X1 5.318e-02 5.128e-02 3.010e+01 1.037
## IA_LABELCompetitorLow :X2 1.505e-01 4.769e-02 5.110e+01 3.155
## IA_LABELCompetitorLow :X3 -8.107e-02 3.826e-02 5.590e+01 -2.119
## IA_LABELCompetitorLow :X4 -5.130e-02 9.737e-03 2.116e+03 -5.269
## Pr(>|t|)
## (Intercept) < 2e-16 ***
## IA_LABELCompetitorLow 0.001534 **
## X1 6.30e-06 ***
## X2 2.46e-08 ***
## X3 2.84e-05 ***
## X4 0.000504 ***
## IA_LABELCompetitorLow :X1 0.307937
## IA_LABELCompetitorLow :X2 0.002692 **
## IA_LABELCompetitorLow :X3 0.038575 *
## IA_LABELCompetitorLow :X4 1.51e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) IA_LAB X1 X2 X3 X4 IA_LABELCL:X1
## IA_LABELCmL -0.584
## X1 0.068 -0.107
## X2 -0.337 0.287 -0.519
## X3 0.014 -0.022 -0.291 -0.079
## X4 0.075 0.000 0.364 -0.400 -0.059
## IA_LABELCL:X1 -0.122 0.210 -0.511 0.222 0.249 0.000
## IA_LABELCL:X2 0.271 -0.463 0.184 -0.618 0.090 0.000 -0.359
## IA_LABELCL:X3 -0.018 0.032 0.180 0.079 -0.705 0.000 -0.353
## IA_LABELCL:X4 0.000 0.000 0.000 0.000 0.000 -0.294 0.000
## IA_LABELCL:X2 IA_LABELCL:X3
## IA_LABELCmL
## X1
## X2
## X3
## X4
## IA_LABELCL:X1
## IA_LABELCL:X2
## IA_LABELCL:X3 -0.127
## IA_LABELCL:X4 0.000 0.000
#find summary for model 5
summary(Model6)
## Linear mixed model fit by REML t-tests use Satterthwaite approximations
## to degrees of freedom [lmerMod]
## Formula:
## Prob ~ IA_LABEL * (X1 + X2 + X3 + X4) + (1 + X1 + X2 + X3 | Subject) +
## (1 + X1 + X2 + X3 + X4 | Subject:IA_LABEL)
## Data: model
##
## REML criterion at convergence: -8226.8
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.8755 -0.5778 -0.0134 0.5740 3.4955
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## Subject:IA_LABEL (Intercept) 4.869e-04 0.022065
## X1 4.245e-02 0.206030 0.25
## X2 3.817e-02 0.195379 -0.52 -0.46
## X3 2.005e-02 0.141590 0.02 -0.36 -0.10
## X4 1.099e-02 0.104841 0.07 0.33 -0.56 -0.03
## Subject (Intercept) 2.269e-04 0.015064
## X1 2.637e-02 0.162400 -0.27
## X2 2.432e-03 0.049316 0.48 -0.97
## X3 2.071e-05 0.004551 0.15 -0.99 0.94
## Residual 1.260e-03 0.035491
## Number of obs: 2378, groups: Subject:IA_LABEL, 58; Subject, 29
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 0.148103 0.005067 51.300000 29.230 < 2e-16
## IA_LABELCompetitorLow -0.020558 0.005975 31.040000 -3.441 0.00168
## X1 -0.256111 0.049159 52.000000 -5.210 3.29e-06
## X2 -0.252285 0.037995 57.800000 -6.640 1.19e-08
## X3 0.123652 0.027119 56.030000 4.560 2.83e-05
## X4 0.063784 0.020554 56.000000 3.103 0.00300
## IA_LABELCompetitorLow :X1 0.053181 0.054903 25.260000 0.969 0.34191
## IA_LABELCompetitorLow :X2 0.150451 0.052149 48.110000 2.885 0.00584
## IA_LABELCompetitorLow :X3 -0.081067 0.038334 55.940000 -2.115 0.03892
## IA_LABELCompetitorLow :X4 -0.051300 0.029067 56.000000 -1.765 0.08304
##
## (Intercept) ***
## IA_LABELCompetitorLow **
## X1 ***
## X2 ***
## X3 ***
## X4 **
## IA_LABELCompetitorLow :X1
## IA_LABELCompetitorLow :X2 **
## IA_LABELCompetitorLow :X3 *
## IA_LABELCompetitorLow :X4 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) IA_LAB X1 X2 X3 X4 IA_LABELCL:X1
## IA_LABELCmL -0.590
## X1 0.064 -0.133
## X2 -0.337 0.340 -0.487
## X3 0.014 -0.010 -0.294 -0.083
## X4 0.053 -0.045 0.241 -0.502 -0.029
## IA_LABELCL:X1 -0.140 0.237 -0.558 0.307 0.246 -0.215
## IA_LABELCL:X2 0.293 -0.496 0.250 -0.686 0.066 0.366 -0.447
## IA_LABELCL:X3 -0.008 0.014 0.194 0.064 -0.707 0.021 -0.348
## IA_LABELCL:X4 -0.037 0.063 -0.170 0.355 0.021 -0.707 0.305
## IA_LABELCL:X2 IA_LABELCL:X3
## IA_LABELCmL
## X1
## X2
## X3
## X4
## IA_LABELCL:X1
## IA_LABELCL:X2
## IA_LABELCL:X3 -0.093
## IA_LABELCL:X4 -0.518 -0.029
#find the RMSE
sqrt(mean((model$Prob-fitted(Model4))^2))
## [1] 0.03816047
#attach the fitted values to the dataset model, so that they can be used later for graphing purposes
model$Model1Fitted<-fitted(Model1)
model$Model2Fitted<-fitted(Model2)
model$Model3Fitted<-fitted(Model3)
model$Model4Fitted<-fitted(Model4)
model$Model5Fitted<-fitted(Model5)
model$Model6Fitted<-fitted(Model6)
model$Model7Fitted<-fitted(Model7)
#check that the values are in the dataset
str(model)
## 'data.frame': 2378 obs. of 15 variables:
## $ time : num 2025 2025 2025 2025 2025 ...
## $ Subject : Factor w/ 29 levels "10spe1","11spe2",..: 1 1 2 3 4 5 6 7 7 8 ...
## $ IA_LABEL : Factor w/ 2 levels "CompetitorHigh ",..: 1 2 2 1 1 2 1 1 2 2 ...
## $ Prob : num 0.1 0.0333 0.1034 0.0345 0.3 ...
## $ X1 : num -0.264 -0.264 -0.264 -0.264 -0.264 ...
## $ X2 : num 0.325 0.325 0.325 0.325 0.325 ...
## $ X3 : num -0.357 -0.357 -0.357 -0.357 -0.357 ...
## $ X4 : num 0.367 0.367 0.367 0.367 0.367 ...
## $ Model1Fitted: num 0.158 0.164 0.139 0.123 0.131 ...
## $ Model2Fitted: num 0.186 0.179 0.208 0.187 0.231 ...
## $ Model3Fitted: num 0.0575 0.0211 0.1535 0.0718 0.2844 ...
## $ Model4Fitted: num 0.0316 -0.0333 0.131 -0.0419 0.2882 ...
## $ Model5Fitted: num 0.10239 0.02159 0.12446 0.00349 0.29014 ...
## $ Model6Fitted: num 0.10156 0.02126 0.12398 0.00349 0.291 ...
## $ Model7Fitted: num 0.1034 0.0194 0.1276 -0.0133 0.3098 ...
Model 4 performs the best, though LMER is not very susceptible to corss-validation, so the addition of more polynomials might be leading to overfitting
Each model used above will be graphed by overlaying it on the real data. First, the plots for the first comparison will be created. For this, I use the model dataset. I put the data in long only format.
ModelLong<- model %>% gather(Model,Predictions,9:15)
#rename the models so that they are more clear
levels(ModelLong$Model) <- list(`Model 1 Intercept`="Model1Fitted", `Model 2 Linear`="Model2Fitted", `Model 3 Quadratic`="Model3Fitted", `Model 4 Cubic`="Model4Fitted", `Model 5 Quartic (not converged)`="Model5Fitted", `Model 6 Quartic (not converged)`="Model6Fitted", `Model 7 Quartic`="Model7Fitted")
#rename some of the colums
colnames(ModelLong)[3]<-"Interest Area"
ModelLong[,3]<-factor(ModelLong[,3])
#compress the data(find the average over subjects)
ForGraph1<-ModelLong %>% group_by(`Interest Area`,time, Model) %>%
summarise(`Fixation Probability`=mean(Prob),Predicted=mean(Predictions))
levels(ForGraph1$`Interest Area`) <- list(`High Competitor`="CompetitorHigh ", `Low Competitor`="CompetitorLow ")
The code chunk below will plot and save the graph
#color palette for color-blind
cbPalette <- c("#999999", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7")
#create plot1 for Experiment 1, which goes in the publication
g3<-ggplot(ForGraph1, aes(x=time-2025, y=`Fixation Probability`, shape=`Interest Area`, color=`Interest Area`)) +
geom_point(size=3) + geom_line(aes(y=Predicted), size=1)+facet_wrap(~Model, nrow=3, ncol=3)+
scale_fill_manual(values=cbPalette)+scale_colour_manual(values=cbPalette)+
ggtitle("Fixation Proportion for Models\n examining High vs. Low Competitors")+
theme_bw()+
scale_x_continuous(breaks=seq(0,1000,100), limits=c(0,1000), name="time 200ms after spoken word onset(ms)")+
theme(legend.text = element_text(size = 12))+
theme(legend.title = element_text(size=12, face="bold"))+
theme(axis.title.x = element_text(face="bold", size=15), axis.text.x= element_text(size=12, angle=90, vjust=0.5))+
theme(axis.title.y = element_text(face="bold", size=15), axis.text.y = element_text(size=12))+
theme(strip.text.x = element_text(size=12, face="bold"))+
theme(plot.title = element_text(lineheight=.8, face="bold", size=14))
g3
ggsave(
"Plot3Exp1.png",
g3,
width = 9.25,
height = 6.25,
dpi = 300
)
Session Info
library(beepr)
beep(8)
sessionInfo()
## R version 3.2.3 (2015-12-10)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 7 x64 (build 7601) Service Pack 1
##
## locale:
## [1] LC_COLLATE=English_United States.1252
## [2] LC_CTYPE=English_United States.1252
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.1252
##
## attached base packages:
## [1] parallel stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] beepr_1.2 caret_6.0-64 lattice_0.20-33
## [4] gridExtra_2.2.1 tidyr_0.4.1 doParallel_1.0.10
## [7] iterators_1.0.8 foreach_1.4.3 dplyr_0.4.3
## [10] lmerTest_2.0-30 lme4_1.1-11 Matrix_1.2-4
## [13] ggplot2_2.1.0
##
## loaded via a namespace (and not attached):
## [1] reshape2_1.4.1 splines_3.2.3 colorspace_1.2-6
## [4] htmltools_0.3 stats4_3.2.3 yaml_2.1.13
## [7] mgcv_1.8-12 survival_2.38-3 nloptr_1.0.4
## [10] foreign_0.8-66 DBI_0.3.1 RColorBrewer_1.1-2
## [13] audio_0.1-5 plyr_1.8.3 stringr_1.0.0
## [16] MatrixModels_0.4-1 munsell_0.4.3 gtable_0.2.0
## [19] codetools_0.2-14 evaluate_0.8.3 labeling_0.3
## [22] latticeExtra_0.6-28 knitr_1.12.3 SparseM_1.7
## [25] quantreg_5.21 pbkrtest_0.4-6 Rcpp_0.12.4
## [28] acepack_1.3-3.3 scales_0.4.0 formatR_1.3
## [31] Hmisc_3.17-2 digest_0.6.9 stringi_1.0-1
## [34] grid_3.2.3 tools_3.2.3 magrittr_1.5
## [37] lazyeval_0.1.10 Formula_1.2-1 cluster_2.0.3
## [40] car_2.1-1 MASS_7.3-45 assertthat_0.1
## [43] minqa_1.2.4 rmarkdown_0.9.5 R6_2.1.2
## [46] rpart_4.1-10 nnet_7.3-12 nlme_3.1-126