Synopsis

The aim of this document is to provide all the analyses for Experiment 2 Overlap all 3. In this experiment, participants were presented with displays of 4 words, among which were the target, a high orthographic competitor, a low orthographic competitor, and a distractor. The structure of the document will be as follows:

Data Cleaning and Pre-processing
Exploratory Analyses
Growth Curve Analysis
Graphs

Libraries

Load the required libraries

library(ggplot2)
library(lme4)
library(lmerTest)
library(dplyr)
library(parallel)
library(doParallel)
library(tidyr)
library(gridExtra)
library(caret)

Data Loading

First, I load the data file indicating the trials that were removed for each subject.

deletedTrials<-read.csv("deleted.csv")

Now I load up the csv files with Exp1 data. There are 4 files (Exp2Part1 and Exp2Part2, Exp2Part3 and sub 11spe2). Not all colums in those files are required. Only the following colums will be loaded in R, so as to preserve memory space. The colums are: RECORDING_SESSION_LABEL, IA_FIRST_RUN_START_TIME, IA_FIRST_RUN_START_TIME, IA_FIRST_RUN_END_TIME, IA_SECOND_RUN_START_TIME, IA_SECOND_RUN_END_TIME, IA_THIRD_RUN_START_TIME, IA_THIRD_RUN_END_TIME, IA_LABEL, RESPONSE_ACC, IA_DWELL_TIME, TRIAL_INDEX, target, RESPONSE_RT,trialtype,target, list, oldnewitem

Part1 <- read.csv("Exp2Part1.csv", na.strings = c("."))[,c('RECORDING_SESSION_LABEL', 'IA_FIRST_RUN_START_TIME', 'IA_FIRST_RUN_START_TIME', 'IA_FIRST_RUN_END_TIME', 'IA_SECOND_RUN_START_TIME', 'IA_SECOND_RUN_END_TIME', 'IA_THIRD_RUN_START_TIME', 'IA_THIRD_RUN_END_TIME', 'IA_LABEL', 'RESPONSE_ACC', 'IA_DWELL_TIME', 'TRIAL_INDEX', 'target', 'RESPONSE_RT', 'trialtype','oldnewitem', 'MOUSE_TIME','DISPLAY_TIME')]

#now load up part 2
Part2 <- read.csv("Exp2Part2.csv", na.strings = c("."))[,c('RECORDING_SESSION_LABEL', 'IA_FIRST_RUN_START_TIME', 'IA_FIRST_RUN_START_TIME', 'IA_FIRST_RUN_END_TIME', 'IA_SECOND_RUN_START_TIME', 'IA_SECOND_RUN_END_TIME', 'IA_THIRD_RUN_START_TIME', 'IA_THIRD_RUN_END_TIME', 'IA_LABEL', 'RESPONSE_ACC', 'IA_DWELL_TIME', 'TRIAL_INDEX', 'target', 'RESPONSE_RT', 'trialtype','oldnewitem', 'MOUSE_TIME','DISPLAY_TIME')]

Part3 <- read.csv("Exp2Part3.csv", na.strings = c("."))[,c('RECORDING_SESSION_LABEL', 'IA_FIRST_RUN_START_TIME', 'IA_FIRST_RUN_START_TIME', 'IA_FIRST_RUN_END_TIME', 'IA_SECOND_RUN_START_TIME', 'IA_SECOND_RUN_END_TIME', 'IA_THIRD_RUN_START_TIME', 'IA_THIRD_RUN_END_TIME', 'IA_LABEL', 'RESPONSE_ACC', 'IA_DWELL_TIME', 'TRIAL_INDEX', 'target', 'RESPONSE_RT', 'trialtype','oldnewitem', 'MOUSE_TIME','DISPLAY_TIME')]

#combine part 1 and part 2 into one dataframe
EyeData<-rbind(Part1, Part2, Part3)

Data Cleaning and Pre-Processing

Right now we have a dataset with all the subjects that took part in Experiment 2. In the code below, I perform some preprocessing that aims to do the following: Remove bad trials (e.g., trials with a lot of off-screen fixations or trials in which the fixations don’t start at the center of the screen or trials in which the tracking of the eye was not very good. The decision about which trials should be removed was done by two graduate students, Stas and Julie.) The deleted trials can be found in the deletedTrials file.

#this part of the code is not very effective; need to figure out later how to optimize it.
EyeData<-EyeData[!c(EyeData$RECORDING_SESSION_LABEL=="17spe2" & EyeData$TRIAL_INDEX==45),]
EyeData<-EyeData[!c(EyeData$RECORDING_SESSION_LABEL=="15spe2" & EyeData$TRIAL_INDEX==85),]
EyeData<-EyeData[!c(EyeData$RECORDING_SESSION_LABEL=="9sre2" & EyeData$TRIAL_INDEX==21),]
EyeData<-EyeData[!c(EyeData$RECORDING_SESSION_LABEL=="1tle2" & EyeData$TRIAL_INDEX==69),]
EyeData<-EyeData[!c(EyeData$RECORDING_SESSION_LABEL=="9spe2" & EyeData$TRIAL_INDEX==18),]
20/12652

## [1] 0.001580778

Only 5 trials have been deleted, which constitutes less than .158% of the total number of trials.

In the code below, I transform the NAs to zero, calculate the RT from the onset of the word display till the mouse click time is made.

#change all the NAs to zeros
EyeData[is.na(EyeData)] <- 0
#Calculate RT from the display onset until the mouse click is made
EyeData$RESPONSE_RT<-EyeData$MOUSE_TIME-EyeData$DISPLAY_TIME
#check the dimentions of the dataset
dim(EyeData)

## [1] 12652    18

#remove the MOUSE_TIME and DISPLAY_TIME variables
EyeData<-EyeData %>% select (1:16)

#rename the RECORDING_SESSION_LABEL into Subject
colnames(EyeData)[1] <- "Subject"
#examine colum names
names(EyeData)

##  [1] "Subject"                   "IA_FIRST_RUN_START_TIME"  
##  [3] "IA_FIRST_RUN_START_TIME.1" "IA_FIRST_RUN_END_TIME"    
##  [5] "IA_SECOND_RUN_START_TIME"  "IA_SECOND_RUN_END_TIME"   
##  [7] "IA_THIRD_RUN_START_TIME"   "IA_THIRD_RUN_END_TIME"    
##  [9] "IA_LABEL"                  "RESPONSE_ACC"             
## [11] "IA_DWELL_TIME"             "TRIAL_INDEX"              
## [13] "target"                    "RESPONSE_RT"              
## [15] "trialtype"                 "oldnewitem"

Examine the structure of the dataframe

str(EyeData)

## 'data.frame':    12652 obs. of  16 variables:
##  $ Subject                  : Factor w/ 36 levels "10spe2","11spe2",..: 8 8 8 8 8 8 8 8 8 8 ...
##  $ IA_FIRST_RUN_START_TIME  : num  2775 2497 0 0 2974 ...
##  $ IA_FIRST_RUN_START_TIME.1: num  2775 2497 0 0 2974 ...
##  $ IA_FIRST_RUN_END_TIME    : num  3962 2726 0 0 4030 ...
##  $ IA_SECOND_RUN_START_TIME : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ IA_SECOND_RUN_END_TIME   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ IA_THIRD_RUN_START_TIME  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ IA_THIRD_RUN_END_TIME    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ IA_LABEL                 : Factor w/ 4 levels "CompetitorHigh ",..: 3 1 2 4 3 1 2 4 3 1 ...
##  $ RESPONSE_ACC             : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ IA_DWELL_TIME            : int  1170 210 0 0 1057 223 451 176 950 0 ...
##  $ TRIAL_INDEX              : int  1 1 1 1 2 2 2 2 3 3 ...
##  $ target                   : Factor w/ 88 levels "bait","bead",..: 41 41 41 41 8 8 8 8 63 63 ...
##  $ RESPONSE_RT              : num  2249 2249 2249 2249 2312 ...
##  $ trialtype                : Factor w/ 4 levels "Exp","FillerRel",..: 4 4 4 4 4 4 4 4 4 4 ...
##  $ oldnewitem               : Factor w/ 3 levels "New","Old","x": 3 3 3 3 3 3 3 3 3 3 ...

Subset the experimental trials. This will remove all other trialtypes, such as FillerRel, FillerUnrel and Practice trials

EyeData <- EyeData %>% filter(trialtype=="Exp")

Check subject accuracy

SubjectAccuracy<- EyeData %>% group_by(Subject) %>%
    summarise(MeanAccuracy=mean(RESPONSE_ACC)) %>% 
    arrange(MeanAccuracy)
SubjectAccuracy

## Source: local data frame [36 x 2]
## 
##    Subject MeanAccuracy
##     (fctr)        (dbl)
## 1    3spe2    0.8888889
## 2    8sre2    0.9259259
## 3   15spe2    0.9615385
## 4   14spe2    0.9629630
## 5   10spe2    1.0000000
## 6   11spe2    1.0000000
## 7    1spe2    1.0000000
## 8    1sse2    1.0000000
## 9    2sre2    1.0000000
## 10   2sse2    1.0000000
## ..     ...          ...

mean(SubjectAccuracy$MeanAccuracy)

## [1] 0.9927588

Subject accuracy is very high. Allmost all the subjects have 100% accuracy. Only 4 participants have a few incorrect trials. One subject however has relatively low accuracy (.89)

Create several variables that examine individual differences. Divide the subjects into fast or slow based on their RT performance for experimental trials.

r # library(plyr) # EyeData<- ddply(EyeData, c("Subject"), #              transform, SubjectRT=mean(RESPONSE_RT)) # # #create a new variable that has the levels fast or slow. Participants below the median are fast and above are slow # #the median is # EyeData$SubjectType<-ifelse(EyeData$SubjectRT>=median(EyeData$SubjectRT), "Slow","Fast") # detach("package:plyr")

Remove trials with incorrect accuracy

CorrectEyeData<-EyeData %>% filter(RESPONSE_ACC==1)
#number of rows deleted
nrow(EyeData)-nrow(CorrectEyeData) #28 rows get deleted, which corresponds to 7 incorrect trials

## [1] 28

28/3868 #.72% of experimental trials

## [1] 0.007238883

Set up the start time and the end time dummy-coding of each run in the interest area. This creates colums where if a fixation exists during a run, then the run gets 1 (exists) or if there is no fixation in the IA during a run, then the coding is 0 (not exist)

CorrectEyeData$Fststart <- ifelse(as.numeric(as.character(CorrectEyeData$IA_FIRST_RUN_START_TIME)) > 0, as.numeric(as.character(CorrectEyeData$IA_FIRST_RUN_START_TIME)), 0) 
CorrectEyeData$Fstend <- ifelse(as.numeric(as.character(CorrectEyeData$IA_FIRST_RUN_START_TIME)) > 0, as.numeric(as.character(CorrectEyeData$IA_FIRST_RUN_END_TIME)), 0)
CorrectEyeData$Secstart <- ifelse(as.numeric(as.character(CorrectEyeData$IA_SECOND_RUN_START_TIME)) > 0, as.numeric(as.character(CorrectEyeData$IA_SECOND_RUN_START_TIME)), 0)
CorrectEyeData$Secend <- ifelse(as.numeric(as.character(CorrectEyeData$IA_SECOND_RUN_START_TIME)) > 0, as.numeric(as.character(CorrectEyeData$IA_SECOND_RUN_END_TIME)), 0)
CorrectEyeData$Thirdstart <- ifelse(as.numeric(as.character(CorrectEyeData$IA_THIRD_RUN_START_TIME)) > 0, as.numeric(as.character(CorrectEyeData$IA_THIRD_RUN_START_TIME)), 0)
CorrectEyeData$Thirdend <- ifelse(as.numeric(as.character(CorrectEyeData$IA_THIRD_RUN_START_TIME)) > 0, as.numeric(as.character(CorrectEyeData$IA_THIRD_RUN_END_TIME)), 0)

Generate time bins from time 0 to time 6000 in 25 ms bins and assign them to the dataset

time <- seq(0, 6000, by=25)
tmatrix <- matrix(nrow=nrow(CorrectEyeData), ncol=length(time))
dim(tmatrix)

## [1] 3840  241

Generate time vectors for each row and column for first, second, and third pass viewings so that viewing periods receive a viewing probability value of 1

registerDoParallel(3)
for(i in 1:nrow(tmatrix)) {
for(j in 1:length(time)) {

tmatrix[i,j] <-  ifelse(CorrectEyeData$Fststart[i] < time[j] & 
                CorrectEyeData$Fstend[i] > time[j] |CorrectEyeData$Secstart[i] <
                time[j] & CorrectEyeData$Secend[i] > time[j] | CorrectEyeData$Thirdstart[i] 
                < time[j] & CorrectEyeData$Thirdend[i]>time[j], 1,0)
} 
}

Combine the CleanEyeData with the time matrix

CleanData <- cbind(CorrectEyeData, data.frame(tmatrix))

Assign time values to time bin columns

colnames(CleanData)[23:263] <- seq(0, 6000, by=25)

Subset the dataset with only the necessary colums and remove anything in the memory that might be a memory hog

CleanData <- CleanData[, -c(2:8,10,11,12,13,14, 17:22)]

Put the data in long-format and then calculate the proportion for each time bin for each subject

CleanData<- CleanData %>% gather(time,value,5:245)
#find proportion for each interest area for each subject
CleanDataProb<-CleanData %>% group_by(Subject,IA_LABEL,time) %>%
    summarise(Prob=mean(value))
CleanDataProb$time<-as.numeric(as.character(CleanDataProb$time))

Exploratory Graphs

The aim of this section is to explore pattern in the data without commiting to any formal modeling.

which Competitor is the stronger one?

#subset the required dataset by averaging over subjects; find the usual
OnlySubjects<-CleanData %>%
    group_by(IA_LABEL, time) %>%
    summarise(n=n(),Prob=mean(value),sd=sd(value)) %>%
    mutate(se=sd/sqrt(n), LCI=Prob+qnorm(0.025)*se, UCI=Prob+qnorm(0.975)*se)
OnlySubjects$time<-as.numeric(as.character(OnlySubjects$time))

levels(OnlySubjects$IA_LABEL) <- list(Target="TARGET_IA ", `High Competitor`="CompetitorHigh ", `Low Competitor`="CompetitorLow ", `Distractor`="UNREL2_IA ")

colnames(OnlySubjects)[1]<-"Interest Area"

#plot for time 0 to time 6000
cbPalette <- c("#999999", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7")
g1<-ggplot(OnlySubjects, aes(x=time, y=Prob, color=`Interest Area`, shape=`Interest Area`)) +
    geom_point(size=3) +
    geom_errorbar(aes(ymin=LCI, ymax=UCI)) +
    scale_x_continuous(breaks=seq(0,6000,500), limits=c(0,6000), name="time from trial onset (ms)")+
    geom_vline(xintercept = 1825)+
    geom_vline(xintercept = 3425)+theme_bw()+
    scale_fill_manual(values=cbPalette)+scale_colour_manual(values=cbPalette)+
    scale_y_continuous(name="Fixation Proportion")+
    theme(legend.text = element_text(size = 12))+
    theme(legend.title = element_text(size=12, face="bold"))+
    theme(legend.title = element_text(size=12, face="bold"))+
    theme(axis.title.x = element_text(face="bold", size=15), axis.text.x  = element_text(size=10))+
    theme(axis.title.y = element_text(face="bold", size=15), axis.text.y  = element_text(size=10))+
    theme(strip.text.x = element_text(size=12, face="bold"))+
    theme(plot.title = element_text(lineheight=.8, face="bold", size=14))+theme_bw()

g1

ggsave(
  "Plot1Exp2.png",
  g1,
  width = 9.25,
  height = 6.25,
  dpi = 300
)

    
OnlySubjects$time2<-OnlySubjects$time-1825

g2<-ggplot(OnlySubjects, aes(x=time2, y=Prob, color=`Interest Area`, shape=`Interest Area`)) +
    geom_point(size=3) +
    geom_errorbar(aes(ymin=LCI, ymax=UCI))+
    scale_x_continuous(breaks=seq(0,1300,50), limits=c(0,1300), name="time from spoken target onset(ms)")+
    theme_bw()+
    scale_fill_manual(values=cbPalette)+scale_colour_manual(values=cbPalette)+
    scale_y_continuous(name="Fixation Proportion")+
    theme(legend.text = element_text(size = 12))+
    theme(legend.title = element_text(size=12, face="bold"))+
    theme(legend.title = element_text(size=12, face="bold"))+
    theme(axis.title.x = element_text(face="bold", size=15), axis.text.x  = element_text(size=12, angle=90, vjust=0.5))+
    theme(axis.title.y = element_text(face="bold", size=15), axis.text.y  = element_text(size=12))+
    theme(plot.title = element_text(lineheight=.8, face="bold", size=14))+
    theme(strip.text.x = element_text(size=12, face="bold"))

g2

#the plots shows some warnings because it removes values before display onset and after display onset

ggsave(
  "Plot2Exp2.png",
  g2,
  width = 9.25,
  height = 6.25,
  dpi = 300
)

Create Graph for individual differences

#subset the required dataset by averaging over subjects; find the usual
# OnlySubjects<-CleanData %>%
#     group_by(IA_LABEL, time, SubjectType) %>%
#     summarise(n=n(),Prob=mean(value),sd=sd(value)) %>%
#     mutate(se=sd/sqrt(n), LCI=Prob+qnorm(0.025)*se, UCI=Prob+qnorm(0.975)*se)
# OnlySubjects$time<-as.numeric(as.character(OnlySubjects$time))
# 
# levels(OnlySubjects$IA_LABEL) <- list(Target="TARGET_IA ", `High Competitor`="CompetitorHigh ", `Low Competitor`="CompetitorLow ", `Distractor`="UNREL2_IA ")
# 
# colnames(OnlySubjects)[1]<-"Interest Area"

# #plot for time 0 to time 6000
# cbPalette <- c("#999999", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7")
# 
# OnlySubjects$time2<-OnlySubjects$time-1825
# 
# individual<-ggplot(OnlySubjects, aes(x=time2, y=Prob, color=`Interest Area`, shape=`Interest Area`)) +
#     geom_point(size=3) +
#     geom_errorbar(aes(ymin=LCI, ymax=UCI))+
#     scale_x_continuous(breaks=seq(0,1300,50), limits=c(0,1300), name="time from spoken target onset(ms)")+
#     theme_bw()+
#     scale_fill_manual(values=cbPalette)+scale_colour_manual(values=cbPalette)+
#     scale_y_continuous(name="Fixation Proportion")+
#     theme(legend.text = element_text(size = 12))+
#     theme(legend.title = element_text(size=12, face="bold"))+
#     theme(legend.title = element_text(size=12, face="bold"))+
#     theme(axis.title.x = element_text(face="bold", size=15), axis.text.x  = element_text(size=12, angle=90, vjust=0.5))+
#     theme(axis.title.y = element_text(face="bold", size=15), axis.text.y  = element_text(size=12))+
#     theme(plot.title = element_text(lineheight=.8, face="bold", size=14))+
#     theme(strip.text.x = element_text(size=12, face="bold"))+facet_wrap(~SubjectType)
# 
# individual
# 
# #the plots shows some warnings because it removes values before display onset and after display onset
# 
# ggsave(
#   "IndividualDifferences.png",
#   individual,
#   width = 9.25,
#   height = 6.25,
#   dpi = 300
# )
# 
# Fast subjects seem to identify the target quicker, but otherwise there seem to be no differences in terms of how big the overlap effects shows up.

Growth Curve Analyses

The following analyses will examine 1. the proportion fixations between the High and Low Competitor

High vs. Low Competitor

First, the required subset is created:

model<-CleanDataProb %>% 
    filter(IA_LABEL=="CompetitorHigh " | IA_LABEL=="CompetitorLow ", time>=2025 & time <=3025)
model$IA_LABEL<-as.factor(as.character(model$IA_LABEL))

#Generate time polynomials up to quadratic poly.
t25 <- data.frame(poly(unique(model$time),4))
t25$time <- seq(2025, 3025, by=25)

#add polynomials to data frame
model <- merge(model , t25, by="time")
head(model)

##   time Subject        IA_LABEL       Prob         X1        X2         X3
## 1 2025  10spe2 CompetitorHigh  0.14814815 -0.2639818 0.3245611 -0.3568831
## 2 2025  10spe2  CompetitorLow  0.11111111 -0.2639818 0.3245611 -0.3568831
## 3 2025  11spe2  CompetitorLow  0.14814815 -0.2639818 0.3245611 -0.3568831
## 4 2025   1spe2 CompetitorHigh  0.11111111 -0.2639818 0.3245611 -0.3568831
## 5 2025   1sse2 CompetitorHigh  0.14814815 -0.2639818 0.3245611 -0.3568831
## 6 2025   2sre2  CompetitorLow  0.07407407 -0.2639818 0.3245611 -0.3568831
##          X4
## 1 0.3669381
## 2 0.3669381
## 3 0.3669381
## 4 0.3669381
## 5 0.3669381
## 6 0.3669381

str(model)

## 'data.frame':    2952 obs. of  8 variables:
##  $ time    : num  2025 2025 2025 2025 2025 ...
##  $ Subject : Factor w/ 36 levels "10spe2","11spe2",..: 1 1 2 3 4 5 6 7 7 8 ...
##  $ IA_LABEL: Factor w/ 2 levels "CompetitorHigh ",..: 1 2 2 1 1 2 1 1 2 2 ...
##  $ Prob    : num  0.148 0.111 0.148 0.111 0.148 ...
##  $ X1      : num  -0.264 -0.264 -0.264 -0.264 -0.264 ...
##  $ X2      : num  0.325 0.325 0.325 0.325 0.325 ...
##  $ X3      : num  -0.357 -0.357 -0.357 -0.357 -0.357 ...
##  $ X4      : num  0.367 0.367 0.367 0.367 0.367 ...

The following models are run, with increasing complexity:

registerDoParallel(3)
#base model; Model1
Model1<-lmer(Prob ~ IA_LABEL + 
                 (1| Subject) + 
                 (1| Subject:IA_LABEL),
               data=model, REML=T)
#linear; model 2
Model2<-lmer(Prob ~ IA_LABEL*X1 + 
                 (1+X1| Subject) + 
                 (1+X1| Subject:IA_LABEL),
               data=model, REML=T)
#linea+quadratic; model 3
Model3<-lmer(Prob ~ IA_LABEL*(X1+X2) + 
                 (1+X1+X2| Subject) + 
                 (1+X1+X2| Subject:IA_LABEL),
               data=model, REML=T)
#Model 4
Model4<-lmer(Prob ~ IA_LABEL*(X1+X2+X3) + 
                 (1+X1+X2+X3| Subject) + 
                 (1+X1+X2+X3| Subject:IA_LABEL),
               data=model, REML=T)
#Model 5 failed to converge
Model5<-lmer(Prob ~ IA_LABEL*(X1+X2+X3+X4) + 
                 (1+X1+X2+X3+X4| Subject) + 
                 (1+X1+X2+X3+X4| Subject:IA_LABEL),
               data=model, REML=T)

#compare the models based on ANOVA test
anova(Model1,Model2,Model3,Model4,Model5)

## Data: model
## Models:
## object: Prob ~ IA_LABEL + (1 | Subject) + (1 | Subject:IA_LABEL)
## ..1: Prob ~ IA_LABEL * X1 + (1 + X1 | Subject) + (1 + X1 | Subject:IA_LABEL)
## ..2: Prob ~ IA_LABEL * (X1 + X2) + (1 + X1 + X2 | Subject) + (1 + 
## ..2:     X1 + X2 | Subject:IA_LABEL)
## ..3: Prob ~ IA_LABEL * (X1 + X2 + X3) + (1 + X1 + X2 + X3 | Subject) + 
## ..3:     (1 + X1 + X2 + X3 | Subject:IA_LABEL)
## ..4: Prob ~ IA_LABEL * (X1 + X2 + X3 + X4) + (1 + X1 + X2 + X3 + X4 | 
## ..4:     Subject) + (1 + X1 + X2 + X3 + X4 | Subject:IA_LABEL)
##        Df      AIC     BIC logLik deviance   Chisq Chi Df Pr(>Chisq)    
## object  5  -6383.3 -6353.3 3196.6  -6393.3                              
## ..1    11  -7914.3 -7848.4 3968.2  -7936.3 1543.05      6  < 2.2e-16 ***
## ..2    19  -8964.0 -8850.2 4501.0  -9002.0 1065.71      8  < 2.2e-16 ***
## ..3    29  -9723.4 -9549.6 4890.7  -9781.4  779.32     10  < 2.2e-16 ***
## ..4    41 -10219.9 -9974.3 5151.0 -10301.9  520.59     12  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#compare the models based on R^2 based on a paper from Xu (http://onlinelibrary.wiley.com/doi/10.1002/sim.1572/abstract)
1-var(residuals(Model1))/(var(model.response(model.frame(Model1))))

## [1] 0.1417364

1-var(residuals(Model2))/(var(model.response(model.frame(Model2))))

## [1] 0.5492991

1-var(residuals(Model3))/(var(model.response(model.frame(Model3))))

## [1] 0.7185989

1-var(residuals(Model4))/(var(model.response(model.frame(Model4))))

## [1] 0.8057794

1-var(residuals(Model5))/(var(model.response(model.frame(Model5))))

## [1] 0.8502332

#find the summary for model 4
summary(Model5)

## Linear mixed model fit by REML t-tests use Satterthwaite approximations
##   to degrees of freedom [lmerMod]
## Formula: Prob ~ IA_LABEL * (X1 + X2 + X3 + X4) + (1 + X1 + X2 + X3 + X4 |  
##     Subject) + (1 + X1 + X2 + X3 + X4 | Subject:IA_LABEL)
##    Data: model
## 
## REML criterion at convergence: -10240.4
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.9014 -0.5771  0.0169  0.6106  4.6871 
## 
## Random effects:
##  Groups           Name        Variance  Std.Dev. Corr                   
##  Subject:IA_LABEL (Intercept) 0.0006814 0.02610                         
##                   X1          0.0658951 0.25670  -0.24                  
##                   X2          0.0293725 0.17138  -0.28 -0.07            
##                   X3          0.0226513 0.15050   0.22 -0.61 -0.05      
##                   X4          0.0122534 0.11069   0.04  0.02 -0.80  0.05
##  Subject          (Intercept) 0.0003294 0.01815                         
##                   X1          0.0281716 0.16784   0.14                  
##                   X2          0.0017371 0.04168   0.17 -0.51            
##                   X3          0.0012596 0.03549  -0.60 -0.67 -0.22      
##                   X4          0.0004017 0.02004   0.22 -0.02  0.87 -0.62
##  Residual                     0.0012364 0.03516                         
## Number of obs: 2952, groups:  Subject:IA_LABEL, 72; Subject, 36
## 
## Fixed effects:
##                            Estimate Std. Error        df t value Pr(>|t|)
## (Intercept)                0.144326   0.005377 63.630000  26.839  < 2e-16
## IA_LABELCompetitorLow     -0.014456   0.006287 36.180000  -2.299   0.0274
## X1                        -0.134215   0.051452 64.380000  -2.609   0.0113
## X2                        -0.152413   0.029975 69.810000  -5.085 2.96e-06
## X3                         0.047888   0.026430 69.830000   1.812   0.0743
## X4                         0.002626   0.019644 69.950000   0.134   0.8940
## IA_LABELCompetitorLow :X1 -0.066810   0.061070 38.800000  -1.094   0.2807
## IA_LABELCompetitorLow :X2  0.027994   0.041237 61.880000   0.679   0.4998
## IA_LABELCompetitorLow :X3 -0.013211   0.036429 65.580000  -0.363   0.7180
## IA_LABELCompetitorLow :X4  0.004272   0.027376 63.700000   0.156   0.8765
##                              
## (Intercept)               ***
## IA_LABELCompetitorLow     *  
## X1                        *  
## X2                        ***
## X3                        .  
## X4                           
## IA_LABELCompetitorLow :X1    
## IA_LABELCompetitorLow :X2    
## IA_LABELCompetitorLow :X3    
## IA_LABELCompetitorLow :X4    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##               (Intr) IA_LAB X1     X2     X3     X4     IA_LABELCL:X1
## IA_LABELCmL   -0.585                                                 
## X1            -0.120  0.139                                          
## X2            -0.194  0.185 -0.120                                   
## X3             0.095 -0.146 -0.564 -0.061                            
## X4             0.048 -0.023  0.010 -0.686  0.021                     
## IA_LABELCL:X1  0.137 -0.234 -0.593  0.047  0.406 -0.011              
## IA_LABELCL:X2  0.157 -0.269  0.040 -0.688  0.036  0.523 -0.068       
## IA_LABELCL:X3 -0.124  0.212  0.350  0.036 -0.689 -0.032 -0.590       
## IA_LABELCL:X4 -0.019  0.033 -0.009  0.517 -0.032 -0.697  0.015       
##               IA_LABELCL:X2 IA_LABELCL:X3
## IA_LABELCmL                              
## X1                                       
## X2                                       
## X3                                       
## X4                                       
## IA_LABELCL:X1                            
## IA_LABELCL:X2                            
## IA_LABELCL:X3 -0.052                     
## IA_LABELCL:X4 -0.751         0.046

#find the RMSE
sqrt(mean((model$Prob-fitted(Model5))^2))

## [1] 0.03313396

#attach the fitted values to the dataset model, so that they can be used later for graphing purposes
model$Model1Fitted<-fitted(Model1)
model$Model2Fitted<-fitted(Model2)
model$Model3Fitted<-fitted(Model3)
model$Model4Fitted<-fitted(Model4)
model$Model5Fitted<-fitted(Model5)


#check that the values are in the dataset
str(model)

## 'data.frame':    2952 obs. of  13 variables:
##  $ time        : num  2025 2025 2025 2025 2025 ...
##  $ Subject     : Factor w/ 36 levels "10spe2","11spe2",..: 1 1 2 3 4 5 6 7 7 8 ...
##  $ IA_LABEL    : Factor w/ 2 levels "CompetitorHigh ",..: 1 2 2 1 1 2 1 1 2 2 ...
##  $ Prob        : num  0.148 0.111 0.148 0.111 0.148 ...
##  $ X1          : num  -0.264 -0.264 -0.264 -0.264 -0.264 ...
##  $ X2          : num  0.325 0.325 0.325 0.325 0.325 ...
##  $ X3          : num  -0.357 -0.357 -0.357 -0.357 -0.357 ...
##  $ X4          : num  0.367 0.367 0.367 0.367 0.367 ...
##  $ Model1Fitted: num  0.147 0.154 0.124 0.152 0.117 ...
##  $ Model2Fitted: num  0.173 0.206 0.171 0.104 0.167 ...
##  $ Model3Fitted: num  0.1926 0.0575 0.1382 0.0534 0.1276 ...
##  $ Model4Fitted: num  0.1187 0.0457 0.1472 0.1246 0.1575 ...
##  $ Model5Fitted: num  0.12 0.158 0.151 0.108 0.177 ...

Model 5 performs the best, though LMER is not very susceptible to corss-validation, so the addition of more polynomials might be leading to overfitting

Graphs

Each model used above will be graphed by overlaying it on the real data. First, the plots for the first comparison will be created. For this, I use the model dataset. I put the data in long only format.

ModelLong<- model %>% gather(Model,Predictions,9:13)
#rename the models so that they are more clear
levels(ModelLong$Model) <- list(`Model 1 Intercept`="Model1Fitted", `Model 2 Linear`="Model2Fitted", `Model 3 Quadratic`="Model3Fitted", `Model 4 Cubic`="Model4Fitted", `Model 5 Quartic`="Model5Fitted")

#rename some of the colums
colnames(ModelLong)[3]<-"Interest Area"
ModelLong[,3]<-factor(ModelLong[,3])
#compress the data(find the average over subjects)
ForGraph1<-ModelLong %>% group_by(`Interest Area`,time, Model) %>%
    summarise(`Fixation Probability`=mean(Prob),Predicted=mean(Predictions))

levels(ForGraph1$`Interest Area`) <- list(`High Competitor`="CompetitorHigh ", `Low Competitor`="CompetitorLow ")

The code chunk below will plot and save the graph

#color palette for color-blind
cbPalette <- c("#999999", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7")
#create plot1 for Experiment 1, which goes in the publication
g3<-ggplot(ForGraph1, aes(x=time-2025, y=`Fixation Probability`, shape=`Interest Area`, color=`Interest Area`)) +
    geom_point(size=3) + geom_line(aes(y=Predicted), size=1)+facet_wrap(~Model, nrow=3, ncol=3)+
    scale_fill_manual(values=cbPalette)+scale_colour_manual(values=cbPalette)+
    ggtitle("Fixation Proportion for Models\n examining High vs. Low Competitors")+
    theme_bw()+
    scale_x_continuous(breaks=seq(0,1000,100), limits=c(0,1000), name="time 200ms after spoken word onset(ms)")+
    theme(legend.text = element_text(size = 12))+
    theme(legend.title = element_text(size=12, face="bold"))+
    theme(axis.title.x = element_text(face="bold", size=15), axis.text.x= element_text(size=12, angle=90, vjust=0.5))+
    theme(axis.title.y = element_text(face="bold", size=15), axis.text.y  = element_text(size=12))+
    theme(strip.text.x = element_text(size=12, face="bold"))+
    theme(plot.title = element_text(lineheight=.8, face="bold", size=14))

g3

ggsave(
  "Plot3Exp2.png",
  g3,
  width = 9.25,
  height = 6.25,
  dpi = 300
)

Session Info

sessionInfo()

## R version 3.2.3 (2015-12-10)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 7 x64 (build 7601) Service Pack 1
## 
## locale:
## [1] LC_COLLATE=English_United States.1252 
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] parallel  stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] caret_6.0-64      lattice_0.20-33   gridExtra_2.2.1  
##  [4] tidyr_0.4.1       doParallel_1.0.10 iterators_1.0.8  
##  [7] foreach_1.4.3     dplyr_0.4.3       lmerTest_2.0-30  
## [10] lme4_1.1-11       Matrix_1.2-4      ggplot2_2.1.0    
## 
## loaded via a namespace (and not attached):
##  [1] reshape2_1.4.1      splines_3.2.3       colorspace_1.2-6   
##  [4] htmltools_0.3       stats4_3.2.3        yaml_2.1.13        
##  [7] mgcv_1.8-12         survival_2.38-3     nloptr_1.0.4       
## [10] foreign_0.8-66      DBI_0.3.1           RColorBrewer_1.1-2 
## [13] plyr_1.8.3          stringr_1.0.0       MatrixModels_0.4-1 
## [16] munsell_0.4.3       gtable_0.2.0        codetools_0.2-14   
## [19] evaluate_0.8.3      labeling_0.3        latticeExtra_0.6-28
## [22] knitr_1.12.3        SparseM_1.7         quantreg_5.21      
## [25] pbkrtest_0.4-6      Rcpp_0.12.4         acepack_1.3-3.3    
## [28] scales_0.4.0        formatR_1.3         Hmisc_3.17-2       
## [31] digest_0.6.9        stringi_1.0-1       grid_3.2.3         
## [34] tools_3.2.3         magrittr_1.5        lazyeval_0.1.10    
## [37] Formula_1.2-1       cluster_2.0.3       car_2.1-1          
## [40] MASS_7.3-45         assertthat_0.1      minqa_1.2.4        
## [43] rmarkdown_0.9.5     R6_2.1.2            rpart_4.1-10       
## [46] nnet_7.3-12         nlme_3.1-126

Experiment 2 All 3

Stas Sajin and Julie Gregg

March 31, 2016