This document describes the steps taken to analyse the esquiroles experiment, and generates the appropriate Figures.
One hundred Spanish words, all of them nouns, were selected from the EsPal database (Duchon et al., 2013). All these words were of high frequency (M = 195 per million words in the subtitle corpus of EsPal; range 104-452). The mean number of letters was 6.1 (range: 5-8), the mean Levenshtein distance (OLD20) was 1.7 (range: 1.0-2.9). We also created 100 nonwords with Wuggy (Keeulers & Brysbaert, 2010), so that the number of letters, number of syllables, subsyllabic structure, and transitional probabilities were matched with those of the word stimuli. The list of words and nonwords is available in the Appendix. For the sake of the main manipulation (item repetition), 50 filler (high-frequency: were above 83 occurrences per million) words and 50 filler nonwords were selected, similar in length and in the other psycholinguistic variables to the experimental stimuli. Two sets of materials (Set 1 and Set 2) were constructed and presented across Block 1 and Block 2. For both sets, Block 2 consisted of the same 200 experimental items (100 words and 100 nonwords). Half of those items (50 words and 50 nonwords) were randomly chosen for inclusion in Block 1 for Set 1 (together with 50 filler words and 50 filler nonwords), and the other half were chosen for Set 2 (together with 50 filler words and 50 filler nonwords). Thus, each block contained 200 items (100 words and 100 nonwords).
Important : Only the second block is analyzed here.
#uncomment below if needed
#install.packages("weights")
library(weights)
esquirol.raw <- read.table('raw.data',header=F)
DMDX turns error RT into negative numbers, next line changes it to positives and removes the last few columns which had the practice trials
esquirol.raw1 <- abs(esquirol.raw)
200 columns (24 subjects in rows). The order is (50 of each): words-REP words-NONREPET Pseudow-REP Pseudow-NONREP
wrd <-c(rep("Wrds", 100), rep("Nonws",100))
rep <-c(rep("Rep",50),rep("Nrp",50),rep("Rep",50),rep("Nrp",50))
esquirol.all <- data.frame(SU= character(200*24), WN=character(200*24),RP=character(200*24),AC=numeric(4800), RT=numeric(4800))
esquirol.all[,1]<-as.character(esquirol.all[,2])
esquirol.all[,2]<-as.character(esquirol.all[,2])
esquirol.all[,3]<-as.character(esquirol.all[,2])
for(i in 1:24){
for(j in 1:200){
esquirol.all[(i-1) * 200 + (j),1] <- as.character(paste("Subj",i,sep=""))
esquirol.all[(i-1) * 200 + (j),2] <- as.character(wrd[j])
esquirol.all[(i-1) * 200 + (j),3] <- as.character(rep[j])
esquirol.all[(i-1) * 200 + (j),4] <- as.numeric(esquirol.raw1[24+i,j])
esquirol.all[(i-1) * 200 + (j),5] <- as.numeric(esquirol.raw1[i,j])
}
}
esquirol.all[,1]<-as.factor(esquirol.all[,1])
esquirol.all[,2]<-as.factor(esquirol.all[,2])
esquirol.all[,3]<-as.factor(esquirol.all[,3])
Add cutoffs
culow <- 250
cuhigh <- 1500
esquirol.all1<-esquirol.all[esquirol.all$RT < culow, 1:5]
esquirol.all1<-esquirol.all[esquirol.all$RT < cuhigh, 1:5]
esquirol.all has all the raw data in a data frame format
Mean RTs per condition aggregated by Subject
esquirol.byS <- aggregate(esquirol.all1$RT,list(esquirol.all1$WN, esquirol.all1$RP, esquirol.all1$SU,esquirol.all1$AC),mean, na.rm=T)
Accuracy aggregated by subject
esquirol.bySACC <- aggregate(esquirol.all1$AC,list(esquirol.all1$WN, esquirol.all1$RP, esquirol.all1$SU),mean, na.rm=T)
Now quantiles
p<-seq(.1,.9,.2)
esquirol.quant=matrix(ncol=5,nrow=length(esquirol.byS[,1]))
for(i in 1:5) esquirol.quant[,i]<- aggregate(esquirol.all1$RT,list(esquirol.all1$WN, esquirol.all1$RP, esquirol.all1$SU,esquirol.all1$AC),quantile,probs=p[i],na.rm=T)[,5]
Putting all together
esquirol.byS <- cbind(esquirol.byS, esquirol.quant)
Vincentiles
esquirol.vinc <- aggregate(esquirol.byS[,5:10], list(esquirol.byS[,1], esquirol.byS[,2],esquirol.byS[,4]), mean)
esquirol.vincAcc <- aggregate(esquirol.bySACC[,4], list(esquirol.bySACC[,1], esquirol.bySACC[,2]), mean)
esquirol.vinc[,3]<-c(1-esquirol.vincAcc[,3], esquirol.vincAcc[,3])
Calculate the M, and SE, and CI for accuracy accross subjects
x<-cbind(esquirol.vincAcc, aggregate(esquirol.bySACC[,4], list(esquirol.bySACC[,1], esquirol.bySACC[,2]), sd)[,3])
x<-cbind(x,x[,4]/sqrt(24))
x<-cbind(x,x[,5]*1.96)
x<-cbind(x,x[,3],x[,5])
x[,7]<-(x[,3]-x[,6])
x[,8]<-(x[,3]+x[,6])
colnames(x)[1:4]<-c("lex", "rep", "mean", "sd")
colnames(x)[5:8]<-c("sd/sqrt(24)", "*1.96", "CI-", "CI+")
format(x, digits=3)
Confidence intervals for differences (nonrep-rep) for words
words.acc<-esquirol.bySACC[esquirol.bySACC$Group.1=="Wrds",2:4]
wordsacc.t<-t.test(words.acc[,3] ~ words.acc[,1], paired=T)
wordsacc.t$conf.int
Confidence intervals for differences (nonrep-rep) for nonwords
nonwords.acc<-esquirol.bySACC[esquirol.bySACC$Group.1=="Nonws",2:4]
nonwordsacc.t<-t.test(nonwords.acc[,3] ~ nonwords.acc[,1], paired=T)
nonwordsacc.t$conf.int
This is the table of vincentiles and response probabilities used for the modeling
format(esquirol.vinc, digits=2)
The following lines generate a table for PDF but have no effect for word:
library(xtable)
options(xtable.comment = FALSE)
options(xtable.booktabs = TRUE)
xtable(format(esquirol.vinc, digits=2), caption = "Vincentiles for Second block of Experiment")
#uncomment to generate pdf figure
#pdf("quantiles.pdf")
par(mai=c(1,1.1,1,.6))
matplot(esquirol.vinc[c(5,7),5:9],las=1,col=1,pch=c("1","3","5","7","9"),axes=FALSE, xlim=c(0.7,4.3), type="b",lty=1, lwd=2, ylab="RT(ms)", ylim=c(410, 750))
matlines(3:4, esquirol.vinc[c(6,8),5:9],pch=c("1","3","5","7","9"), type="b",lty=1, lwd=2, col=1)
box(lwd=4)
axis(1,at=1:4,labels=paste(esquirol.vinc[c(5,7,6,8),1],esquirol.vinc[c(5,7,6,8),2],"\n", rd(esquirol.vinc[c(5,7,6,8),3],digits=3)) ,tck=.02,lwd=2)
axis(2,las=1,lwd=2)
abline(v=2.5,lwd=4)
#dev.off()
##Diffusion model parameters
Stimuli | Repeated | ||||||||
---|---|---|---|---|---|---|---|---|---|
Words | No | 0.32 | 0.10 | 0.054 | 0.406 | 0.034 | 0.002 | 0.14 | 0.005 |
Words | Yes | 0.36 | 0.400 | ||||||
Nonwords | No | -0.37 | 0.445 | ||||||
Nonwords | Yes | -0.31 | 0.439 |
The predictions of model based on parameters can be obtained at: http:http://star.psy.ohio-state.edu/cogsys-web/Rpad/corq.Rpad NOTE FROM JULY 7 2017: The website only takes positive drift rates, so to generate the model predictions for nonwords swith the sign (-.37 changes to .37) and adjust the starting point from z to a-z. .10-.054 = .046
dmodel<-read.table(header=T, text="
p mrt q1 q3 q5 q7 q9
0.966 586 476 526 566 614 714
0.942 599 476 530 574 630 749
0.968 543 428 478 519 570 682
0.979 525 418 467 505 551 649
")
Visual inspection of Figure 1 shows that the contrained diffusion model does a very good job capturing the data. In particular, the model nicely accounts for the RT data. Although in general terms, the accuracy levels are also adequately accounted for, the model does a better job for words than for nonwords (for nonrepeated words data:r format(esquirol.vinc[6,3], digits=3)
vs model: r dmodel$p[3]
, and for repeated words data: r format(esquirol.vinc[8,3], digits=3)
vs model: r dmodel$p[4]
). For nonwords the model predicts a reduction in accuracy of .024, that in the data is of only .002 (for nonrepeated nonwords data:r format(esquirol.vinc[5,3], digits=3)
vs model: r dmodel$p[1]
, and for repeated words data: r format(esquirol.vinc[7,3], digits=3)
vs model: r dmodel$p[2]
)
The behavior of the parameters is consistent with other applications of the diffusion model:
-
Drift Rate. The drift rate for both word and nonwords becomes more positive as a function of repetition. Recall that the drift rate maps into the familiary/wordlikeness dimension, for words, this helps performance, as words become more familiar and hence the evidence towards the word boundary is accumulated at a faster rate. For nonwords, on the other hand, this hinders performance, as familiarity makes the negative decision slower.
-
Encoding time. The
$T_{er}$ parameter suggests that repetition migh facilitate the encoding process. This effect is rather small (6 ms for both words and nonword), but it is worth mentioning because it matches the behavior of this parameter in priming tasks (Gomez, Perea, & Ratcliff, 2013).
The interplay of the encoding time and the drift rate for nonwords produces an interesting pattern of results: the faster responses (the lower quantiles) show a facilitation for repeated nonwrods, but the slower responses (the higher quantiles) show an inhibition for repeated nonwords.
#uncomment to generate pdf figure
#pdf("quantiles.pdf")
par(mai=c(1,1.1,1,.6))
matplot(esquirol.vinc[c(5,7),5:9],las=1,col=1,pch=c("1","3","5","7","9"),axes=FALSE, xlim=c(0.7,4.3), type="b",lty=1, lwd=2, ylab="RT(ms)", ylim=c(410, 750))
matlines(3:4, esquirol.vinc[c(6,8),5:9],pch=c("1","3","5","7","9"), type="b",lty=1, lwd=2, col=1)
# Model for words
matpoints(3:4,dmodel[3:4,3:7], type="b",lty=2, col=2,pch=1)
#axis(3,at=3:4,labels = paste("model fit \n", rd(dmodel[3:4,1],digits=3)), col=2)
#Model for nonwords
matpoints(1:2,dmodel[1:2,3:7], type="b",lty=2, col=2,pch=1)
#axis(3,at=1:2,labels = paste("model fit \n", rd(dmodel[1:2,1],digits=3)), col=2)
box(lwd=4)
axis(1,at=1:4,labels=paste(esquirol.vinc[c(5,7,6,8),1],esquirol.vinc[c(5,7,6,8),2],"\n", rd(esquirol.vinc[c(5,7,6,8),3],digits=3)) ,tck=.02,lwd=2)
axis(2,las=1,lwd=2)
abline(v=2.5,lwd=4)
#dev.off()