romanegloo/we_eval_sets.md

## we_eval_sets.md

      
    Raw
  

              we_eval_sets.md
            
          
    Evaluation Datasets for Word Embeddings
SimVerb-3500 [download]

semantic similarity of verbs

[sample]
hurt    offend          V       6.81    SYNONYMS
clarify worry           V       0.33    NONE
fasten  attach          V       8.47    HYPER/HYPONYMS
meet    introduce       V       2.82    NONE
throw   kick            V       1.66    COHYPONYMS

The MEN Test Collection [download]

semantic similarity and relatedness; human judgements obtained by Mechanical Turkers who choose a word pair that has more close relationship than other pair.
[sample]
automobile car 50.000000
river water 49.000000
stairs staircase 49.000000
...
movie shopping 22.000000
shade twigs 22.000000
frost sunny 22.000000
...
muscle tulip 1.000000
bikini pizza 1.000000
bakery zebra 0.000000

Rare Word Dataset (U of Cambridge) [download]

evaluation for rare word representations. This one claims higher inter-annotator agreement (IAA) than other datasets (such as Stanford RW, SimVerb-3500)


sleepwalking
somnambulists
3.88


2mro
tomorrow
4.00


currency
concurrency
0.13


must-see
interesting
3.06


carbinolamine
hemiaminal
3.88


biting_point
clutch
2.19


random_seed
BiLSTM
1.56


black_hole
blackmail
0.06


SimLex-999 [download]

capture similarity, rather than relatedness or association


word1
word2
POS
SimLex999
conc(w1)
conc(w2)
concQ
Assoc(USF)
SimAssoc333
SD(SimLex)


old
new
A
1.58
2.72
2.81
2
7.25
1
0.41


smart
intelligent
A
9.2
1.75
2.46
1
7.11
1
0.67


hard
difficult
A
8.77
3.76
2.21
2
5.94
1
1.19


happy
cheerful
A
9.55
2.56
2.34
1
5.85
1
2.18


hard
easy
A
0.95
3.76
2.07
2
5.82
1
0.93


fast
rapid
A
8.75
3.32
3.07
2
5.66
1
1.68


happy
glad
A
9.17
2.56
2.36
1
5.49
1
1.59


short
long
A
1.23
3.61
3.18
2
5.36
1
1.58


stupid
dumb
A
9.58
1.75
2.36
1
5.26
1
1.48


Google Analogy [download]

unbalanced: 8,869 semantic and 10,675 syntactic questions, with 20-70 pairs per category; country:capital relation is over 50% of all semantic questions. Relations in the syntactic part largely the same as MSR.
Athens Greece Baghdad Iraq
Athens Greece Bangkok Thailand
Ashgabat Turkmenistan Conakry Guinea
Ashgabat Turkmenistan Copenhagen Denmark
Kabul Afghanistan Rabat Morocco
Kabul Afghanistan Riga Latvia
Croatia kuna Bulgaria lev
Croatia kuna Cambodia riel
sudden suddenly cheerful cheerfully
sudden suddenly complete completely
simple simpler sharp sharper
simple simpler short shorter
falling fell thinking thought
falling fell vanishing vanished
pear pears elephant elephants
pear pears eye eyes
write writes shuffle shuffles
write writes sing sings

BATS [download]

(link to the original website is not valid now)

dataset balanced across 4 types of relations (inflectional morphology, derivational morphology, lexicographic semantics, encyclopedic semantics)
10 relations of each type, 50 unique pairs per category
99,200 questions in total
sleepwalking	somnambulists	3.88
2mro	tomorrow	4.00
currency	concurrency	0.13
must-see	interesting	3.06
carbinolamine	hemiaminal	3.88
biting_point	clutch	2.19
random_seed	BiLSTM	1.56
black_hole	blackmail	0.06
word1	word2	POS	SimLex999	conc(w1)	conc(w2)	concQ	Assoc(USF)	SimAssoc333	SD(SimLex)
old	new	A	1.58	2.72	2.81	2	7.25	1	0.41
smart	intelligent	A	9.2	1.75	2.46	1	7.11	1	0.67
hard	difficult	A	8.77	3.76	2.21	2	5.94	1	1.19
happy	cheerful	A	9.55	2.56	2.34	1	5.85	1	2.18
hard	easy	A	0.95	3.76	2.07	2	5.82	1	0.93
fast	rapid	A	8.75	3.32	3.07	2	5.66	1	1.68
happy	glad	A	9.17	2.56	2.36	1	5.49	1	1.59
short	long	A	1.23	3.61	3.18	2	5.36	1	1.58
stupid	dumb	A	9.58	1.75	2.36	1	5.26	1	1.48