Skip to content

Instantly share code, notes, and snippets.


(Bill) Yuchen Lin yuchenlin

View GitHub Profile
from transformers import GPT2Tokenizer, GPT2LMHeadModel
import torch
from torch.nn import CrossEntropyLoss
from tqdm import trange
max_length = 24
batch_size = 200
yuchenlin /
Last active Apr 24, 2020
Compute sentence probability using GPT-2 with huggingface transformers
import torch
from transformers import OpenAIGPTTokenizer, OpenAIGPTLMHeadModel
from transformers import GPT2Tokenizer, GPT2LMHeadModel
import numpy as np
from scipy.special import softmax
def model_init(model_string, cuda):
if model_string.startswith("gpt2"):
tokenizer = GPT2Tokenizer.from_pretrained(model_string)
model = GPT2LMHeadModel.from_pretrained(model_string)
yuchenlin /
Last active Feb 17, 2020
A simple example script for predicting masked words in a sentence using BERT.
import torch
from transformers import BertTokenizer, BertModel, BertForMaskedLM
import logging
logging.basicConfig(level=logging.INFO)# OPTIONAL
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForMaskedLM.from_pretrained('bert-base-uncased')
chuanconggao /
Last active Jun 2, 2020
The original minimal 15 lines implementation of PrefixSpan. Full library at
from collections import defaultdict
def frequent_rec(patt, mdb):
results.append((len(mdb), patt))
occurs = defaultdict(list)
for (i, startpos) in mdb:
seq = db[i]
for j in range(startpos + 1, len(seq)):
l = occurs[seq[j]]
peterjc123 / build.ps1
Last active Nov 12, 2018
Setup script for Windows PyTorch
View build.ps1
# Prerequisites
# 1. MSVC 2017 C++ Build Tools
# 2. CMAKE 3.0 or up
# 3. 64 bits of Windows
# 4. Anaconda / MiniConda 64 bits
# Prerequisites for CUDA
# 1. CUDA 8.0 or up
# 2. NVTX( in CUDA as Visual Studio Integration. if fail to install, you can extract
# the CUDA installer exe and found the NVTX installer under the CUDAVisualStudioIntegration)
# this script installs GCC 5.4.0
# to use it navigate to your home directory and type:
# sh
# download and install gcc 4.9.3
tar xzf gcc-5_4_0-release.tar.gz
cd gcc-5_4_0-release
Tushar-N /
Last active May 18, 2020
How to use pad_packed_sequence in pytorch<1.1.0
import torch
import torch.nn as nn
from torch.nn.utils.rnn import pack_padded_sequence, pad_packed_sequence
seqs = ['gigantic_string','tiny_str','medium_str']
# make <pad> idx 0
vocab = ['<pad>'] + sorted(set(''.join(seqs)))
# make model
WeiTang114 /
Created Mar 13, 2017
Show username after each process in nvidia-smi.
# Show username after each process in nvidia-smi
# like:
# ...
# +------------------------------------------------------+
# | Processes: GPU Memory |
# | GPU PID Type Process name Usage |
# |======================================================|
# | 0 150752 C python 830MiB | User: user1
# | 1 2185 C /usr/bin/python 1090MiB | User: user2
nathanielove /
Created Nov 1, 2016
How to setup Shadowsocks on your Ubuntu server

How to setup Shadowsocks on your Ubuntu server

Your school or company network may block the access to a few specific websites. To solve this problem, I'd highly recommend Shadowsocks, since it is the easiest proxy tool I've ever found, and it's FREE (of course iff you have your own server running).

First, ssh to your server, and make sure you have Python and pip installed. If you have Python but not pip, install it using the following command

$ sudo apt-get install python3-pip
tmdavid /
Last active Sep 20, 2019
Visualize word embeddings, using tsne.
Visualize word embeddings, using tsne.
First computes cosine distance of the 100 closests words, and then shows a clustering graph
of the first 11 closest words (the first one is always the word)
line 31: glove_file = '../TBIR/glove.840B.300d.txt' MODIFY with the appropiate path
To Use it, you can just type: python <list of words space separated>
e.g: python cake word embedding music
You can’t perform that action at this time.