Skip to content

Instantly share code, notes, and snippets.

View yuchenlin's full-sized avatar
:octocat:

(Bill) Yuchen Lin yuchenlin

:octocat:
View GitHub Profile
@yuchenlin
yuchenlin / clean_conceptnet.py
Created March 5, 2020 05:10
Cleaning ConceptNet
```
wget https://s3.amazonaws.com/conceptnet/downloads/2017/edges/conceptnet-assertions-5.5.5.csv.gz
gunzip -k conceptnet-assertions-5.5.5.csv.gz
```
import json
def del_pos(s):
"""
Deletes part-of-speech encoding from an entity string, if present.
@yuchenlin
yuchenlin / gpt_sent_prob.py
Last active May 21, 2023 17:12
Compute sentence probability using GPT-2 with huggingface transformers
import torch
from transformers import OpenAIGPTTokenizer, OpenAIGPTLMHeadModel
from transformers import GPT2Tokenizer, GPT2LMHeadModel
import numpy as np
from scipy.special import softmax
def model_init(model_string, cuda):
if model_string.startswith("gpt2"):
tokenizer = GPT2Tokenizer.from_pretrained(model_string)
model = GPT2LMHeadModel.from_pretrained(model_string)
@yuchenlin
yuchenlin / masked_word_prediction_bert.py
Last active August 15, 2023 17:30
A simple example script for predicting masked words in a sentence using BERT.
import torch
from transformers import BertTokenizer, BertModel, BertForMaskedLM
import logging
logging.basicConfig(level=logging.INFO)# OPTIONAL
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForMaskedLM.from_pretrained('bert-base-uncased')
model.eval()