Skip to content

Instantly share code, notes, and snippets.

View allanj's full-sized avatar
🎯
Focusing

Allan Jie allanj

🎯
Focusing
View GitHub Profile
@allanj
allanj / example.jsonl
Created April 7, 2024 07:21
Example Reflexion Json (CodeLLaMA-7b-instruct)
{"name": "HumanEval_79_decimal_to_binary", "language": "py", "prompt": "def decimal_to_binary(decimal: int) -> str:\n \"\"\"You will be given a number in decimal form and your task is to convert it to\n binary format. The function should return a string, with each character representing a binary\n number. Each character in the string will be '0' or '1'.\n\n There will be an extra couple of characters 'db' at the beginning and at the end of the string.\n The extra characters are there to help with the format.\n\n Examples:\n >>> decimal_to_binary(15)\n 'db1111db'\n >>> decimal_to_binary(32)\n 'db100000db'\n \"\"\"\n", "doctests": "transform", "original": "/home/arjun/repos/nuprl/MultiPL-E/datasets/../datasets/originals-with-cleaned-doctests/HumanEval_79_decimal_to_binary.py", "prompt_terminology": "reworded", "stop_tokens": ["\ndef", "\n#", "\nif", "\nclass"], "entry_point": "decimal_to_binary", "test": "def check(candidate):\n assert candidate(0) == 'db0db'\n assert cand
@allanj
allanj / bootstrap.py
Created February 7, 2024 09:10
Bootstraping t test
"""
This is a simple example to show how to calculate the p_value of two models' accuracy
Bootstrapint t-test
"""
import random
random.seed(42)
# assume we have test set 1000 samples
# we just create dummy results to demo
groundtruth = [random.choice(['A', 'B', 'C']) for _ in range(1000)]
@allanj
allanj / demo_sft_with_accelerate.py
Last active January 11, 2024 13:05
demo_sft_script
from torch.utils.data import DataLoader
from transformers import AutoTokenizer, PreTrainedTokenizerFast, set_seed, AutoModelForCausalLM, AutoConfig
from tqdm import tqdm
import argparse
import torch
import torch.nn as nn
import logging
from typing import Dict, Tuple
from accelerate import Accelerator, DistributedDataParallelKwargs
from accelerate.logging import get_logger
@allanj
allanj / command.md
Created August 15, 2022 02:36
Useful Command in Linux

Kill process contain certain string

For example, kill command contains python3 -u experiment_main.py

kill $(ps aux | grep '[p]ython3 -u experiment_main.py' | awk '{print $2}')

Hadoop List files by date

hdfs dfs -ls / | sort -k6,7
@allanj
allanj / BIOtoBIOES.py
Last active March 15, 2022 11:49
Convert the IOB2 tagging scheme to BIOES tagging scheme
def iob_iobes(tags):
"""
IOB2 (BIO) -> IOBES
"""
new_tags = []
for i, tag in enumerate(tags):
if tag == 'O':
new_tags.append(tag)
elif tag.split('-')[0] == 'B':
if i + 1 != len(tags) and \
@allanj
allanj / fairseq_gen.py
Created July 8, 2021 04:22
Fairseq Generation
import torch
from fairseq.models.bart import BARTModel
bart = BARTModel.from_pretrained(
'model_files/bart-large-model',
checkpoint_file='checkpoint_best.pt',
data_name_or_path='data/cloze_replace_all-bin'
)
bart.cuda()
@allanj
allanj / iob1toiob2_funct.py
Last active March 29, 2021 14:37
Convert the tags from IOB1 to IOB2 tagging scheme
"""
IOB1: O I I B I
IOB2: O B I B I
"""
from typing import List
def iob2(tags: List[str]):
"""
Check that tags have a valid IOB format.
@allanj
allanj / Random Images on Refresh
Created June 15, 2020 16:19 — forked from stephenscaff/Random Images on Refresh
Super simple way to randomly load new images on refresh via Jquery and DOM injection. Great for banners.
<!DOCTYPE html>
<head>
<!--Little CSS fade in -->
<style>
.fade-in{
-webkit-animation: fade-in 2s ease;
-moz-animation: fade-in ease-in-out 2s both;
-ms-animation: fade-in ease-in-out 2s both;
-o-animation: fade-in ease-in-out 2s both;
@allanj
allanj / Install
Created June 5, 2020 05:48 — forked from ines/Install
Streamlit + spaCy
pip install streamlit
pip install spacy
python -m spacy download en_core_web_sm
python -m spacy download en_core_web_md
python -m spacy download de_core_news_sm
@allanj
allanj / coref_bert.jsonnet
Created October 16, 2019 03:30
Coreference with BERT implementation using Latest AllenNLP package (0.9.0)
local bert_model = "bert-base-uncased";
local train_path = "./datasets/coref/train.english.v4_gold_conll";
local dev_path = "./datasets/coref/dev.english.v4_gold_conll";
local test_path = "./datasets/coref/test.english.v4_gold_conll";
{
"dataset_reader": {
"type": "coref",
"token_indexers": {
"bert": {