Skip to content

Instantly share code, notes, and snippets.

View FoobarProtocol's full-sized avatar
🤐
Stealth

Foobar Protocol FoobarProtocol

🤐
Stealth
View GitHub Profile
@FoobarProtocol
FoobarProtocol / GPT_iteration.py
Created December 29, 2023 05:08
OpenAI Skeleton for Those Looking to Iteratively Employ ChatGPT
#!/usr/bin/env python3
import json
import openai
import time
import os
import logging
from openai.error import InvalidRequestError, RateLimitError
from concurrent.futures import ThreadPoolExecutor
@FoobarProtocol
FoobarProtocol / CodeT5_fine_tune_iteration.py
Created October 23, 2023 21:51
This is one iteration of the fine-tuning script for CodeT5+; warning I don't think that this script is complete
import argparse
import os
import torch
from accelerate import Accelerator
from datasets import load_dataset
from peft import LoraConfig, get_peft_model, prepare_model_for_int8_training, set_peft_model_state_dict
from torch.utils.data import IterableDataset
from tqdm import tqdm
from transformers import AutoConfig, AutoModelForCausalLM, AutoTokenizer, Trainer, TrainingArguments, logging, set_seed
@FoobarProtocol
FoobarProtocol / Split_CSV_Files.py
Created October 21, 2023 23:48
This is a really robust self-created script that partitions CSV files contingent on user input so let's get to using it shall we?
import argparse
import csv
import os
# Function to calculate total rows in CSV
def get_total_rows(csv_file):
with open(csv_file, 'r') as f:
return sum(1 for row in csv.reader(f)) - 1 # Exclude header
# Function to split CSV files
@FoobarProtocol
FoobarProtocol / DataPreprocessing.py
Created October 21, 2023 23:47
Very comprehensive dataset preprocessing for solidity smart contracts. Immaculately commented too. You're welcome if you've stumbled upon this #givingbacktothecommunity ; explains the logic behind all decisions that I've made when it comes to that too.
# Importing necessary libraries for data preprocessing and visualization
import matplotlib.pyplot as plt
import numpy as np
from tqdm.notebook import trange
import pandas as pd
import random
import torch
import re
from datasets import load_dataset
from simplet5 import SimpleT5
@FoobarProtocol
FoobarProtocol / Datagen_Evolved_Seeds.py
Created October 21, 2023 23:46
This is some really high quality code here from 'Peyton Cleveland' (guy from GitHub here - https://github.com/PeytonCleveland/Fair-Use/blob/main/Darwin-JSTS/main.py)
import os
import logging
import csv
import argparse
from tqdm import tqdm
import sys
from dotenv import load_dotenv
import openai
logging.basicConfig(level=logging.INFO,
@FoobarProtocol
FoobarProtocol / Convert Alpaca to Evol Dataset.py
Created October 21, 2023 23:46
This brief piece of code outlines how to convert an alpca
def convert_alpaca_to_evol(
file_path: str,
lines: bool = False,
output_file: str = "converted_alpaca.json"
):
"""Convert the Instruction/Input/Output format of Alpaca Instruct datasets
to the Evol-Instruct format of Instruction/Output. Inputs are appended to the
instructions.
Args:
@FoobarProtocol
FoobarProtocol / main_evol_instruct.py
Created October 21, 2023 23:45
This is the `main.py` from the Evol_Instruct repo that got removed by the researchers (this makes calls to the different transformative prompts; this is meant to be separated from the Auroboros technique)
import json
import random
from openai_access import call_chatgpt
from depth import createConstraintsPrompt, createDeepenPrompt, createConcretizingPrompt, createReasoningPrompt
from breadth import createBreadthPrompt
fr = open('alpaca_data.json','r')
@FoobarProtocol
FoobarProtocol / convert_to_conversation.py
Created October 21, 2023 23:45
This script does exactly what the name suggests & converts the instruction to conversation
import re
import json
import uuid
inputs = [json.loads(line) for line in open("instructions.jsonl").readlines()]
def split_response(instruction, response):
if '</s>' not in response:
return [
{
"from": "human",
@FoobarProtocol
FoobarProtocol / OpenAI_MultiThreaded_Req.py
Created October 21, 2023 23:44
This is a script that allows for multiple threaded requests up at OpenAI so that we can create prompts from it within the pipeline
import openai
api_keys = ['api-key-1', 'api-key-2', 'api-key-3'] # Replace with your actual API keys
num_prompts = 1000
prompts_per_request = 100 # Adjust based on your needs
num_requests = 10
prompts = []
for i in range(num_requests):
@FoobarProtocol
FoobarProtocol / Flan-T5-XXL_ContextWindow.py
Created October 21, 2023 23:44
Flan T5 XXL normally has a fixed content window (512 tokens). This can be prohihbitive, especially when considering the plethora of tasks that this model is capable of performing. However one of the researchers from Google gave us the code for allowing the model to generate past the 512 `max_tokens` limit and understanding well beyond the 512 co…
from transformers import AutoTokenizer, BitsAndBytesConfig, AutoModelForSeq2SeqLM
model_id = "google/flan-t5-xxl"
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=False
)
model = AutoModelForSeq2SeqLM.from_pretrained(model_id, quantization_config=quantization_config)