This export contains a csv file that can be used for testing that the subject is
paying attention to the words. The file is available in questions.csv
.
The word vectors operate on a scale of 1 to 5, but this export only returns questions that all or the vast majority of mechanical turk users agreed upon as being one of the extremes (1 for always no and 5 for always yes). The yes/no answers are split approximately 50% for yes and 50% for no so the subject cannot learn a response pattern.
The file looks similar to the following:
BEAR,WOULD YOU FIND IT IN THE FOREST?,YES
CAT,CAN YOU PET IT?,YES
COW,DOES IT LIVE IN WATER?,NO
DOG,DOES IT HAVE SEEDS?,NO
HORSE,CAN YOU SEE THROUGH IT?,NO
ARM,IS IT MANUFACTURED?,NO
EYE,DOES IT MAKE A SOUND?,NO
...
Which, of course, is the following format:
word,question,answer
word,question,answer
word,question,answer
...
The source code used to generate these is available in the ./source/
directory.
I don't anticipate this will be necessary, but it's there if you need it for
anything. Note that it requires Python3 and scipy to run properly, but can be
ran/installed using the following:
pip3 install scipy.io # Install scipy
python3 ./question-extractor.py --vectors ./human218.mat --words ./60words.txt
It will output to standard out. You can configure it to output to a file
with the --output <filename>
parameter if needed. The random number generator
is seeded, but the seed can be changed with the --seed <seed>
parameter.