Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
!pip install stanfordnlp | |
import stanfordnlp | |
stanfordnlp.download('fa') | |
nlp = stanfordnlp.Pipeline(processors = "tokenize", lang="fa", models_dir=".") | |
str = "برای رفتن به سمرقند باید از خانهای بیشمار گذشت" | |
doc = nlp(str) | |
for sent in doc.sentences: | |
for wrd in sent.words: | |
print(wrd.text) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from torchtext.data import BucketIterator, interleave_keys | |
batch_size = 32 | |
def batch_size_fn(new, count, sofar): | |
"Keep augmenting batch and calculate total number of tokens + padding." | |
# (new example to add, current effective batch size, current count of examples in the batch) | |
# when returned value meets batch_size (effective, innate effective batch_size | |
# defined as global bala bala) then wraper create a batch |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
[ | |
{"page": "Mathematics"}, | |
{"page": "Mathematician"}, | |
{"page": "Arithmetic"}, | |
{"page": "Addition"}, | |
{"page": "Subtraction"}, | |
{"page": "Multiplication"}, | |
{"page": "Division (mathematics)"}, | |
{"page": "Euclidean algorithm"}, | |
{"page": "Fraction (mathematics)"}, |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import json | |
freader = open('test_random_split.json') | |
data = json.load(freader) | |
with open('correct-sample.json', 'w') as | |
for entry in data: | |
json.dump(entry, outfile) | |
outfile.write('\n') |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import json | |
from pprint import pprint | |
from tqdm import tqdm | |
freader = open('test_random_split.json') | |
data = json.load(freader) | |
print(len(data)) | |
new_dataset = [] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
df['polarity'] = df['Text'].map(lambda text: textblob.TextBlob(text).sentiment.polarity) | |
df['review_len'] = df['Text'].astype(str).apply(len) | |
df['word_count'] = df['Text'].apply(lambda x: len(str(x).split())) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
! wget -q "https://drive.google.com/uc?export=download&id=1-3tnHTdDjtMd9O2LgKN2ir3t5KvnqrXI" -O dataset.zip | |
! unzip dataset.zip | |
import subprocess | |
import shlex | |
file_id = "1xhiGDTihHYUbGES88sYt4S6nLDjKEji1" | |
file_name = "mscoco.zip" | |
url_get_cookie = f"https://drive.google.com/uc?export=download&id={file_id}" |
OlderNewer