Skip to content

Instantly share code, notes, and snippets.

Rani ranihorev

Block or report user

Report or block ranihorev

Hide content and notifications from this user.

Learn more about blocking users

Contact Support about this user’s behavior.

Learn more about reporting abuse

Report abuse
View GitHub Profile
@ranihorev
ranihorev / BPE
Created Jan 6, 2019
Byte Pair Encoding example (Source: Sennrich et al. - https://arxiv.org/abs/1508.07909)
View BPE
import re, collections
def get_stats(vocab):
pairs = collections.defaultdict(int)
for word, freq in vocab.items():
symbols = word.split()
for i in range(len(symbols)-1):
pairs[symbols[i],symbols[i+1]] += freq
return pairs
@ranihorev
ranihorev / BPE
Created Jan 6, 2019
Byte Pair Encoding example (Source: Sennrich et al.)
View BPE
import re, collections
def get_stats(vocab):
pairs = collections.defaultdict(int)
for word, freq in vocab.items():
symbols = word.split()
for i in range(len(symbols)-1):
pairs[symbols[i],symbols[i+1]] += freq
return pairs
@ranihorev
ranihorev / Structured_with_text.py
Last active Apr 13, 2019
PyTorch module for classification or regression of categorical+continuous+text inputs. This module is based on fast.ai library
View Structured_with_text.py
from fastai.text import *
from fastai.structured import proc_df
import pandas as pd
import numpy as np
class MixedInputModelWithText(nn.Module):
def __init__(self, emb_szs, n_cont, emb_drop, out_sz, szs, drops,
y_range=None, use_bn=False, is_reg=True, is_multi=False, n_text=0):
super().__init__()
for i, (c, s) in enumerate(emb_szs): assert c > 1, f"cardinality must be >=2, got emb_szs[{i}]: ({c},{s})"
View test8
0xCe5E7214E74b62F2a1398db5CFb86eF68f8c1EF3
View test7
0x732d75a4000cB2F38914FC1B6440A9E3753e21f2
View test6
0x541ea7e288d7344b28903ca862ff9fe3efc8c6cd
View test4
0xe837758e2f4a21dd0abd4ec3ca252b5ad352bcb2
View testA1
0xB4963717d4C74D7948054158aF885e264b000ee3
View test2
0xDE700F13f56777204ebA68b0b6F5E88ED4Cd03e8
You can’t perform that action at this time.