View pairs.tsv
We can't make this file beautiful and searchable because it's too large.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
フルーツサラダ fruits salad | |
クリッパーチップ clipper chip | |
ライフサイクル life cycle | |
ボイストレーニング voice training | |
オップアート op art | |
ノーズコーン nose cone | |
インカムタックス income tax | |
エグゼクティブフロア executive floor | |
ウェブフォーム web form | |
ハムサンド ham sand |
View rubert-embedding.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python | |
# Documented in: https://metatext.io/models/DeepPavlov-rubert-base-cased | |
import transformers | |
model_name = "DeepPavlov/rubert-base-cased" | |
model = transformers.AutoModel.from_pretrained(model_name) | |
tokenizer = transformers.AutoTokenizer.from_pretrained(model_name) |
View asciify.pl
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/perl | |
use strict; | |
use warnings; | |
use Unicode::Normalize; | |
use open ":encoding(utf8)"; | |
binmode STDIN, ":encoding(utf8)"; | |
binmode STDOUT, ":encoding(ascii)"; |
View 95to27.pl
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/perl | |
use strict; | |
use warnings; | |
use open ":encoding(ascii)"; | |
binmode STDIN, ":encoding(ascii)"; | |
binmode STDOUT, ":encoding(ascii)"; | |
binmode STDERR, ":encoding(ascii)"; |
View sgml2docs.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python | |
"""Extracts documents from the Gigaword SGML.""" | |
import argparse | |
import logging | |
import os | |
import bs4 |
View LING78100-lecture02.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
View lnre.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python | |
"""LNRE calculator. | |
This script computes a number of statistics characterizing LNRE data: | |
* N: corpus size | |
* V: vocabulary size | |
* V(1): the number of _hapax legomena_ (symbols occuring once) | |
* V(2): the number of _dis legomena_ (symbols occurring twice) | |
* V/N: vocabulary growth rate |
View byte.sym
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<epsilon> 0 | |
<SOH> 1 | |
<STX> 2 | |
<ETX> 3 | |
<EOT> 4 | |
<ENQ> 5 | |
<ACK> 6 | |
<BEL> 7 | |
<BS> 8 | |
<HT> 9 |
View casefold.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python | |
import fileinput | |
import nltk | |
if __name__ == "__main__": | |
for line in fileinput.input(): | |
print(line.rstrip().casefold()) |
View word_tokenize.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python | |
import fileinput | |
import nltk | |
if __name__ == "__main__": | |
for line in fileinput.input(): | |
print(" ".join(nltk.word_tokenize(line))) |
NewerOlder