This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Iam primum omnium satis constat Troia capta in ceteros saevitum esse Troianos, duobus, Aeneae Antenorique, et vetusti iure hospitii et quia pacis reddendaeque Helenae semper auctores fuerant, omne ius belli Achiuos abstinuisse; casibus deinde variis Antenorem cum multitudine Enetum, qui seditione ex Paphlagonia pulsi et sedes et ducem rege Pylaemene ad Troiam amisso quaerebant, venisse in intimum maris Hadriatici sinum, Euganeisque qui inter mare Alpesque incolebant pulsis Enetos Troianosque eas tenuisse terras. Et in quem primo egressi sunt locum Troia vocatur pagoque inde Troiano nomen est: gens universa Veneti appellati. Aeneam ab simili clade domo profugum sed ad maiora rerum initia ducentibus fatis, primo in Macedoniam venisse, inde in Siciliam quaerentem sedes delatum, ab Sicilia classe ad Laurentem agrum tenuisse. Troia et huic loco nomen est. Ibi egressi Troiani, ut quibus ab immenso prope errore nihil praeter arma et naues superesset, cum praedam ex agris agerent, Latinus rex Aboriginesque qui tum ea |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Θουκυδίδης Ἀθηναῖος ξυνέγραψε τὸν πόλεμον τῶν Πελοποννησίων καὶ Ἀθηναίων, ὡς ἐπολέμησαν πρὸς ἀλλήλους, ἀρξάμενος εὐθὺς καθισταμένου καὶ ἐλπίσας μέγαν τε ἔσεσθαι καὶ ἀξιολογώτατον τῶν προγεγενημένων, τεκμαιρόμενος ὅτι ἀκμάζοντές τε ᾖσαν ἐς αὐτὸν ἀμφότεροι παρασκευῇ τῇ πάσῃ καὶ τὸ ἄλλο Ἑλληνικὸν ὁρῶν ξυνιστάμενον πρὸς ἑκατέρους, τὸ μὲν εὐθύς, τὸ δὲ καὶ διανοούμενον. κίνησις γὰρ αὕτη μεγίστη δὴ τοῖς Ἕλλησιν ἐγένετο καὶ μέρει τινὶ τῶν βαρβάρων, ὡς δὲ εἰπεῖν καὶ ἐπὶ πλεῖστον ἀνθρώπων. τὰ γὰρ πρὸ αὐτῶν καὶ τὰ ἔτι παλαίτερα σαφῶς μὲν εὑρεῖν διὰ χρόνου πλῆθος ἀδύνατα ἦν, ἐκ δὲ τεκμηρίων ὧν ἐπὶ μακρότατον σκοποῦντί μοι πιστεῦσαι ξυμβαίνει οὐ μεγάλα νομίζω γενέσθαι οὔτε κατὰ τοὺς πολέμους οὔτε ἐς τὰ ἄλλα. φαίνεται γὰρ ἡ νῦν Ἑλλὰς καλουμένη οὐ πάλαι βεβαίως οἰκουμένη, ἀλλὰ μεταναστάσεις τε οὖσαι τὰ πρότερα καὶ ῥᾳδίως ἕκαστοι τὴν ἑαυτῶν ἀπολείποντες βιαζόμενοι ὑπό τινων αἰεὶ πλειόνων. τῆς γὰρ ἐμπορίας οὐκ οὔσης, οὐδ᾽ ἐπιμειγνύντες ἀδεῶς ἀλλήλοις οὔτε κατὰ γῆν οὔτε διὰ θαλάσσης, νεμόμενοί τε τὰ αὑτῶν ἕκαστοι ὅσον ἀποζ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
(cltk) AMAC02Z92FELVCG:cltkv1 kyle.p.johnson$ mv ~/cltk_data ~/cltk_data_bak | |
(cltk) AMAC02Z92FELVCG:cltkv1 kyle.p.johnson$ poetry run python src/cltkv1/nlp.py | |
Do you want to download file 'https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.ar.vec' to '/Users/kyle.p.johnson/cltk_data/arb/embeddings/fasttext/wiki.ar.vec'? [y/n] y | |
100%|█████████████████████████████████████| 1.61G/1.61G [02:39<00:00, 10.1MiB/s] | |
Do you want to download file 'https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.arc.vec' to '/Users/kyle.p.johnson/cltk_data/arc/embeddings/fasttext/wiki.arc.vec'? [y/n] y | |
100%|█████████████████████████████████████| 8.66M/8.66M [00:00<00:00, 10.9MiB/s] | |
Do you want to download file 'https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.got.vec' to '/Users/kyle.p.johnson/cltk_data/got/embeddings/fasttext/wiki.got.vec'? [y/n] y | |
100%|█████████████████████████████████████| 6.94M/6.94M [00:00<00:00, 10.3MiB/s] | |
Do you want to download file 'https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
(cltk) AMAC02Z92FELVCG:cltkv1 kyle.p.johnson$ poetry run ipython | |
Python 3.7.5 (default, Oct 31 2019, 20:57:45) | |
Type 'copyright', 'credits' or 'license' for more information | |
IPython 7.9.0 -- An enhanced Interactive Python. Type '?' for help. | |
In [1]: import stanfordnlp | |
In [2]: stanfordnlp.__version__ | |
Out[2]: '0.2.0' |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ ipython | |
Python 3.7.4 (v3.7.4:e09359112e, Jul 8 2019, 14:54:52) | |
Type 'copyright', 'credits' or 'license' for more information | |
IPython 7.8.0 -- An enhanced Interactive Python. Type '?' for help. | |
In [1]: from cltk.corpus.latin.wordnet import WordNetCorpusReader | |
In [2]: LWN = WordNetCorpusReader() | |
In [3]: uirtus = LWN.lemma('uirtus', 'n', 'n-s---fn3-') |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
"""An example of a proposed NLP pipeline system. Goals are to allow for: | |
1. default NLP pipeline for any given language | |
2. users to override default pipeline | |
3. users to choose alternative code (classes/methods/functions) w/in the CLTK | |
4. users to use their own custom code (inheriting or replacing those w/in CLTK) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
set -e | |
apt-get update -q | |
apt-get upgrade -q -y | |
apt-get install -y software-properties-common | |
add-apt-repository ppa:webupd8team/java < /dev/null | |
apt-get update -q | |
echo debconf shared/accepted-oracle-license-v1-1 select true | sudo debconf-set-selections | |
echo debconf shared/accepted-oracle-license-v1-1 seen true | sudo debconf-set-selections |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Running python-crfsuite-0.9.5/setup.py -q bdist_egg --dist-dir /tmp/easy_install-lbeiimu8/python-crfsuite-0.9.5/egg-dist-tmp-3yelhlin | |
cc1plus: warning: command line option ‘-std=c99’ is valid for C/ObjC but not for C++ | |
pycrfsuite/_pycrfsuite.cpp: In function ‘void __Pyx__ExceptionSave(PyThreadState*, PyObject**, PyObject**, PyObject**)’: | |
pycrfsuite/_pycrfsuite.cpp:14140:21: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_type’; did you mean ‘curexc_type’? | |
*type = tstate->exc_type; | |
^~~~~~~~ | |
curexc_type | |
pycrfsuite/_pycrfsuite.cpp:14141:22: error: ‘PyThreadState {aka struct _ts}’ has no member named ‘exc_value’; did you mean ‘curexc_value’? | |
*value = tstate->exc_value; | |
^~~~~~~~~ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
'\u1fbc', # ᾼ Greek Capital Letter Alpha with Prosgegrammeni | |
'\u0391',# Α Greek Capital Letter Alpha | |
'\u1fcc', # ῌ Greek Capital Letter Eta with Prosgegrammeni | |
'\u0397', # Η Greek Capital Letter Eta | |
'\u1ffc' # ῼ Greek Capital Letter Omega with Prosgegrammeni | |
'\u03a9', # Ω Greek Capital Letter Omega | |
'\u1f88', # ᾈ Greek Capital Letter Alpha with Psili and Prosgegrammeni |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import os | |
with open(os.path.expanduser('~/Downloads/subscript-non-pairs')) as fo: | |
text = fo.read() | |
pairs = text.split('\n\n') | |
map_sub_nosub = {} | |
for pair in pairs: | |
key, val = pair.split('\n') |
NewerOlder