Skip to content

Instantly share code, notes, and snippets.

@kylepjohnson
kylepjohnson / avg_words_per_sentence_per_phi5_author_v3.py
Created September 21, 2014 19:11
For computing sentence length data for PHI5 authors.
"""For computing sentence length data for PHI5 authors."""
import ast
from cltk.tokenize.sentence_tokenizer_latin import tokenize_latin_sentences
from collections import Counter
from nltk.tokenize import RegexpTokenizer
import os
import re
@kylepjohnson
kylepjohnson / phi5_auth_sent_data_v3.txt
Last active August 29, 2015 14:06
Words per sentence data for TLG authors.
{'Sentius Augurinus': {'sent_count': 4, 'word_count': 45, 'avg_words_per_sent': 11.25, 'tally_of_sent_word_lengths': {17: 1, 10: 1, 5: 1, 13: 1}}, 'Cornelius Epicadus': {'sent_count': 1, 'word_count': 8, 'avg_words_per_sent': 8.0, 'tally_of_sent_word_lengths': {8: 1}}, 'Marcus Aurelius': {'sent_count': 1, 'word_count': 5, 'avg_words_per_sent': 5.0, 'tally_of_sent_word_lengths': {5: 1}}, 'Publius Rutilius Lupus': {'sent_count': 432, 'word_count': 4388, 'avg_words_per_sent': 10.157407407407407, 'tally_of_sent_word_lengths': {1: 78, 2: 31, 3: 5, 4: 20, 5: 11, 6: 14, 7: 18, 8: 24, 9: 27, 10: 25, 11: 23, 12: 15, 13: 21, 14: 18, 15: 13, 16: 15, 17: 8, 18: 12, 19: 6, 20: 6, 21: 7, 22: 3, 23: 3, 24: 4, 25: 3, 26: 2, 27: 4, 28: 1, 29: 2, 30: 1, 31: 1, 32: 1, 33: 1, 34: 1, 35: 2, 37: 2, 40: 1, 52: 2, 79: 1}}, 'Priapea': {'sent_count': 248, 'word_count': 3519, 'avg_words_per_sent': 14.189516129032258, 'tally_of_sent_word_lengths': {1: 1, 2: 8, 3: 5, 4: 9, 5: 12, 6: 17, 7: 11, 8: 6, 9: 7, 10: 14, 11: 18, 12: 14, 13: 23,
@kylepjohnson
kylepjohnson / tlg_auth_word_sentence_v3.csv
Last active August 29, 2015 14:06
CSV export of words per sentence data for TLG authors.
We can't make this file beautiful and searchable because it's too large.
author,1,2,3,5,9,avg_words_per_sent,17,sent_count,21,22,word_count,26,4,6,7,8,10,11,12,13,14,15,16,18,19,20,24,25,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,51,52,53,54,55,58,61,62,63,64,65,67,71,73,78,79,80,81,88,89,90,94,95,103,108,110,116,122,124,125,23,142,173,198,50,84,134,98,99,56,60,75,57,59,138,68,70,74,77,82,86,130,97,102,119,66,72,76,101,111,184,96,107,113,169,69,83,85,87,106,127,146,204,208,91,803,143,100,117,92,273,104,115,118,135,140,151,93,109,126,131,137,145,147,149,154,159,161,166,167,175,189,193,469,218,220,242,248,129,165,105,160,171,944,565,395,1497,474,380,112,114,120,123,128,132,133,136,139,141,144,148,150,152,153,155,156,157,158,163,168,170,172,176,178,179,182,187,190,192,195,205,209,210,216,232,233,552,254,259,775,788,564,335,403,418,591,679,528,180,121,261,174,186,188,191,194,268,164,181,183,185,199,207,212,213,225,227,245,266,294,196,282,761,162,356,252,203,177,197,200,201,206,228,230,246,276,287,295,306,456,223,215,217,221,226,234,237,247,251,253,260,271,283
@kylepjohnson
kylepjohnson / avg_words_per_sentence_per_tlg_author_v3.py
Last active August 29, 2015 14:06
For computing sentence length data for TLG authors.
"""For computing sentence length data for TLG authors."""
import ast
from cltk.tokenize.sentence_tokenizer_greek import tokenize_greek_sentences
from collections import Counter
from nltk.tokenize import RegexpTokenizer
import os
import re
@kylepjohnson
kylepjohnson / tlg_auth_sent_data_v3.txt
Created September 21, 2014 16:33
Words per sentence data for TLG authors.
{'Elegiaca Adespota (CA)': {'sent_count': 10, 'word_count': 187, 'avg_words_per_sent': 18.7, 'tally_of_sent_word_lengths': {2: 1, 5: 1, 39: 1, 8: 1, 12: 1, 16: 3, 19: 1, 54: 1}}, 'Apollodorus Carystius vel Apollodorus Gelous Comic.': {'sent_count': 57, 'word_count': 856, 'avg_words_per_sent': 15.017543859649123, 'tally_of_sent_word_lengths': {1: 1, 2: 1, 3: 2, 5: 3, 6: 6, 11: 5, 12: 3, 13: 7, 14: 3, 15: 4, 16: 2, 17: 1, 18: 2, 19: 1, 20: 5, 22: 2, 23: 1, 24: 1, 25: 1, 27: 1, 28: 2, 30: 1, 32: 1, 49: 1}}, 'Aristocrates Hist.': {'sent_count': 26, 'word_count': 234, 'avg_words_per_sent': 9.0, 'tally_of_sent_word_lengths': {1: 12, 2: 3, 3: 1, 10: 1, 43: 1, 12: 1, 45: 1, 14: 1, 13: 1, 19: 2, 26: 1, 15: 1}}, 'Echembrotus Eleg. et Lyr.': {'sent_count': 2, 'word_count': 19, 'avg_words_per_sent': 9.5, 'tally_of_sent_word_lengths': {17: 1, 2: 1}}, 'Anonymi In Aristotelis Sophisticos Elenchos Phil.': {'sent_count': 2350, 'word_count': 56930, 'avg_words_per_sent': 24.22553191489362, 'tally_of_sent_word_lengths': {1: 26,