Skip to content

Instantly share code, notes, and snippets.

View RamonYeung's full-sized avatar
🚀
Working on a rocket ticket !

杨海宏 RamonYeung

🚀
Working on a rocket ticket !
  • MIT, The Alibaba DAMO Academy
  • Hangzhou, China
View GitHub Profile
@RamonYeung
RamonYeung / BPE
Created June 27, 2019 08:33 — forked from ranihorev/BPE
Byte Pair Encoding example (Source: Sennrich et al. - https://arxiv.org/abs/1508.07909)
import re, collections
def get_stats(vocab):
pairs = collections.defaultdict(int)
for word, freq in vocab.items():
symbols = word.split()
for i in range(len(symbols)-1):
pairs[symbols[i],symbols[i+1]] += freq
return pairs
from graphviz import Digraph
import torch
from torch.autograd import Variable, Function
def iter_graph(root, callback):
queue = [root]
seen = set()
while queue:
fn = queue.pop()
if fn in seen:

tmux cheatsheet

As configured in my dotfiles.

start new:

tmux

start new with session name:

@RamonYeung
RamonYeung / tmux-cheatsheet.markdown
Created April 5, 2019 03:38 — forked from MohamedAlaa/tmux-cheatsheet.markdown
tmux shortcuts & cheatsheet

tmux shortcuts & cheatsheet

start new:

tmux

start new with session name:

tmux new -s myname
@RamonYeung
RamonYeung / pad_packed_demo.py
Created February 16, 2019 17:09 — forked from Tushar-N/pad_packed_demo.py
How to use pad_packed_sequence in pytorch
import torch
import torch.nn as nn
from torch.nn.utils.rnn import pack_padded_sequence, pad_packed_sequence
seqs = ['gigantic_string','tiny_str','medium_str']
# make <pad> idx 0
vocab = ['<pad>'] + sorted(set(''.join(seqs)))
# make model
''' Script for downloading all GLUE data.
Note: for legal reasons, we are unable to host MRPC.
You can either use the version hosted by the SentEval team, which is already tokenized,
or you can download the original data from (https://download.microsoft.com/download/D/4/6/D46FF87A-F6B9-4252-AA8B-3604ED519838/MSRParaphraseCorpus.msi) and extract the data from it manually.
For Windows users, you can run the .msi file. For Mac and Linux users, consider an external library such as 'cabextract' (see below for an example).
You should then rename and place specific files in a folder (see below for an example).
mkdir MRPC
cabextract MSRParaphraseCorpus.msi -d MRPC
# coding: utf-8
import logging
import re
from collections import Counter
import numpy as np
import torch
from sklearn.datasets import fetch_20newsgroups
from torch.autograd import Variable
@RamonYeung
RamonYeung / rank_metrics.py
Created May 17, 2018 06:47 — forked from bwhite/rank_metrics.py
Ranking Metrics
"""Information Retrieval metrics
Useful Resources:
http://www.cs.utexas.edu/~mooney/ir-course/slides/Evaluation.ppt
http://www.nii.ac.jp/TechReports/05-014E.pdf
http://www.stanford.edu/class/cs276/handouts/EvaluationNew-handout-6-per.pdf
http://hal.archives-ouvertes.fr/docs/00/72/67/60/PDF/07-busa-fekete.pdf
Learning to Rank for Information Retrieval (Tie-Yan Liu)
"""
import numpy as np
@RamonYeung
RamonYeung / pg-pong.py
Created February 22, 2018 13:32 — forked from karpathy/pg-pong.py
Training a Neural Network ATARI Pong agent with Policy Gradients from raw pixels
""" Trains an agent with (stochastic) Policy Gradients on Pong. Uses OpenAI Gym. """
import numpy as np
import cPickle as pickle
import gym
# hyperparameters
H = 200 # number of hidden layer neurons
batch_size = 10 # every how many episodes to do a param update?
learning_rate = 1e-4
gamma = 0.99 # discount factor for reward
@RamonYeung
RamonYeung / tree.md
Created December 29, 2016 08:15 — forked from upsuper/tree.md
一行 Python 实现树

一行 Python 实现树

使用 Python 内置的 defaultdict,我们可以很容易的定义一个树形数据结构:

def tree(): return defaultdict(tree)

就是这样!