Skip to content

Instantly share code, notes, and snippets.

$ make -j $(nproc)
Scanning dependencies of target nccl_install
Scanning dependencies of target marian_version
Scanning dependencies of target pathie-cpp
Scanning dependencies of target SQLiteCpp
Scanning dependencies of target libyaml-cpp
Scanning dependencies of target zlib
[ 0%] Running cpp protocol buffer compiler on sentencepiece_model.proto
[ 1%] Running cpp protocol buffer compiler on sentencepiece.proto
[ 2%] Running cpp protocol buffer compiler on sentencepiece_model.proto
[2019-08-15 08:31:02] [marian] Marian v1.7.8 c65c26d6 2019-08-11 18:27:00 +0100
[2019-08-15 08:31:02] [marian] Running on walle3 as process 24138 with command line:
[2019-08-15 08:31:02] [marian] /home/xyz/marian-dev/build/marian --model /disk2/models/xx-yy-r0/model.npz --type transformer --train-sets /disk2/data/xx-yy/train.sk /disk2/data/xx-yy/train.en --vocabs /disk2/models/xx-yy-r0/vocab.src.spm /disk2/models/xx-yy-r0/vocab.trg.spm --dim-vocabs 32000 32000 --mini-batch-fit --mini-batch 1000 --maxi-batch 1000 --valid-freq 10000 --save-freq 10000 --disp-freq 500 --valid-metrics ce-mean-words perplexity bleu-detok --valid-sets /disk2/data/xx-yy/valid.sk /disk2/data/xx-yy/valid.en --quiet-translation --beam-size 6 --normalize=0.6 --valid-mini-batch 16 --early-stopping 5 --cost-type=ce-mean-words --log /disk2/models/xx-yy-r0/train.log --valid-log /disk2/models/xx-yy-r0/valid.log --enc-depth 6 --dec-depth 6 --transformer-preprocess n --transformer-postprocess da --tied-embeddings-all --dim-emb 1024 --transforme
This file has been truncated, but you can view the full file.
The Project Gutenberg EBook of The Adventures of Sherlock Holmes
by Sir Arthur Conan Doyle
(#15 in our series by Sir Arthur Conan Doyle)
Copyright laws are changing all over the world. Be sure to check the
copyright laws for your country before downloading or redistributing
this or any other Project Gutenberg eBook.
This header should be the first thing seen when viewing this Project
Gutenberg file. Please do not remove it. Do not change or edit the
from keras.models import Sequential
from keras.layers import Dense, Activation
model = Sequential([
Dense(32, input_shape=(784,)),
Activation('relu'),
Dense(10),
Activation('softmax'),
])
$ th
______ __ | Torch7
/_ __/__ ________/ / | Scientific computing for Lua.
/ / / _ \/ __/ __/ _ \ |
/_/ \___/_/ \__/_//_/ | https://github.com/torch
| http://torch.ch
th> torch.Tensor{1,2,3}
1
class ToxicDataset(Dataset):
def __init__(self, texts, labels):
self.texts = texts
self.vocab = Dictionary(texts)
special_tokens = {'<pad>': 0, '<unk>':1}
self.vocab = Dictionary(texts)
self.vocab.patch_with_special_tokens(special_tokens)
# Vectorize labels
self.labels = torch.tensor(labels)
import os
from argparse import Namespace
from collections import Counter
import json
import re
import string
import numpy as np
import pandas as pd
import torch
nationality nationality_index split surname
Arabic 15 train Totah
Arabic 15 train Abboud
Arabic 15 train Fakhoury
Arabic 15 train Srour
Arabic 15 train Sayegh
Arabic 15 train Cham
Arabic 15 train Haik
Arabic 15 train Kattan
Arabic 15 train Khouri
Language is never, ever, ever, random
ADAM KILGARRIFF
Abstract
Language users never choose words randomly, and language is essentially
non-random. Statistical hypothesis testing uses a null hypothesis, which
@alvations
alvations / time_ngrams.md
Last active October 18, 2018 04:02
Zipping might not be as fast as we thought...

How fast can we make the ngrams() function in NLTK?

From https://stackoverflow.com/q/21883108/610569, it suggested:

def zipgrams(sequence, n):
    """ From https://stackoverflow.com/q/21883108/610569"""
    return zip(*[sequence[i:] for i in range(n)])