Skip to content

Instantly share code, notes, and snippets.

🚀
Working on a rocket ticket !

杨海宏 RamonYeung

🚀
Working on a rocket ticket !
  • MIT, The Alibaba DAMO Academy
  • Hangzhou, China
Block or report user

Report or block RamonYeung

Hide content and notifications from this user.

Learn more about blocking users

Contact Support about this user’s behavior.

Learn more about reporting abuse

Report abuse
View GitHub Profile
@RamonYeung
RamonYeung / BPE
Created Jun 27, 2019 — forked from ranihorev/BPE
Byte Pair Encoding example (Source: Sennrich et al. - https://arxiv.org/abs/1508.07909)
View BPE
import re, collections
def get_stats(vocab):
pairs = collections.defaultdict(int)
for word, freq in vocab.items():
symbols = word.split()
for i in range(len(symbols)-1):
pairs[symbols[i],symbols[i+1]] += freq
return pairs
View bad_grad_viz.py
from graphviz import Digraph
import torch
from torch.autograd import Variable, Function
def iter_graph(root, callback):
queue = [root]
seen = set()
while queue:
fn = queue.pop()
if fn in seen:
View text2img.py
# encoding: utf-8
import os
import pygame
font_file = '/System/Library/Fonts/PingFang.ttc'
chinese_dir = 'chinese'
if not os.path.exists(chinese_dir):
os.mkdir(chinese_dir)
View tmux_cheatsheet.markdown

tmux cheatsheet

As configured in my dotfiles.

start new:

tmux

start new with session name:

View tmux-cheatsheet.markdown

tmux shortcuts & cheatsheet

start new:

tmux

start new with session name:

tmux new -s myname
@RamonYeung
RamonYeung / pad_packed_demo.py
Created Feb 16, 2019 — forked from Tushar-N/pad_packed_demo.py
How to use pad_packed_sequence in pytorch
View pad_packed_demo.py
import torch
import torch.nn as nn
from torch.nn.utils.rnn import pack_padded_sequence, pad_packed_sequence
seqs = ['gigantic_string','tiny_str','medium_str']
# make <pad> idx 0
vocab = ['<pad>'] + sorted(set(''.join(seqs)))
# make model
View download_glue_data.py
''' Script for downloading all GLUE data.
Note: for legal reasons, we are unable to host MRPC.
You can either use the version hosted by the SentEval team, which is already tokenized,
or you can download the original data from (https://download.microsoft.com/download/D/4/6/D46FF87A-F6B9-4252-AA8B-3604ED519838/MSRParaphraseCorpus.msi) and extract the data from it manually.
For Windows users, you can run the .msi file. For Mac and Linux users, consider an external library such as 'cabextract' (see below for an example).
You should then rename and place specific files in a folder (see below for an example).
mkdir MRPC
cabextract MSRParaphraseCorpus.msi -d MRPC
@RamonYeung
RamonYeung / loading_pretrained_models
Created Jul 12, 2018
PyTorch Loading Pre-trained Models
View loading_pretrained_models
# 1. Directly Load a Pre-trained Model
# https://github.com/pytorch/vision/tree/master/torchvision/models
import torchvision.models as models
resnet50 = models.resnet50(pretrained=True)
# or
model = models.resnet50(pretrained=False)
# Maybe you want to modify the last fc layer?
resnet.fc = nn.Linear(2048, 2)
@RamonYeung
RamonYeung / simplify_chinese_characters.py
Created Jul 4, 2018
Demo for Converting Traditional Chinese Characters to Simplified Chinese Characters.
View simplify_chinese_characters.py
# copy and paste two modules.
# https://raw.githubusercontent.com/skydark/nstools/master/zhtools/langconv.py
# https://raw.githubusercontent.com/skydark/nstools/master/zhtools/zh_wiki.py
from langconv import *
def Traditional2Simplified(sentence):
sentence = Converter('zh-hans').convert(sentence)
return sentence
View fast_glove.py
# coding: utf-8
import logging
import re
from collections import Counter
import numpy as np
import torch
from sklearn.datasets import fetch_20newsgroups
from torch.autograd import Variable
You can’t perform that action at this time.