Data: 60 brief passages of Ancient Greek prose, from Herodotus to Plotinus. 9,278 words total and 3,863 different word forms.
The texts (in plain text format) are published here: [https://bitbucket.org/nevenjovanovic/hellenismos-hypostates/src/master/pos_txt/], directories p1, p2, p3.
The tokenized and cleaned-up XML version (words in w
, punctuation in pc
, names of source files as @id
; combined diacritics and letters replaced with precomposed characters where necessary) is in the same repository: [https://bitbucket.org/nevenjovanovic/hellenismos-hypostates/src/master/pos_txt/tokenizedp/grctxt.xml].
The words were sent to the online Morpheus parser at [http://morph.perseids.org/analysis/word?lang=grc&engine=morpheusgrc&word=], using the XQuery script [https://bitbucket.org/nevenjovanovic/hellenismos-hypostates/src/master/scripts/ParsePerseusGetHeadwordFromDB.xq].