Skip to content

Instantly share code, notes, and snippets.

View Kyle1668's full-sized avatar

Kyle O'Brien Kyle1668

View GitHub Profile
@Kyle1668
Kyle1668 / blocklist.py
Last active November 12, 2025 21:19
[WIP] Geodesic Blocklist for Alignment Data Filtering
"""Blocklist containing regex patterns for filtering AI safety-related content.
This module defines lists of terms (regular expression patterns) used to quickly
filter documents that may contain content related to AI safety, existential risks,
misaligned AI systems, and related concepts.
See the filters.py gist for the logic. Some of these blocklist categories have more advanced logic,
while, for others, the documents is filtered if it contains any key terms.
"""
@Kyle1668
Kyle1668 / filter.py
Created February 13, 2025 22:53
Apply Filters to Dataset
from sklearn.metrics import classification_report, confusion_matrix
from datasets import load_dataset, concatenate_datasets, Dataset, DatasetDict
from vllm import LLM, SamplingParams
from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM
from tqdm import tqdm
import torch.distributed as dist
from datetime import timedelta
import pandas as pd
import torch
import re
@Kyle1668
Kyle1668 / .py
Created May 2, 2024 18:30
Format ToxiGen Dataset
from datasets import load_dataset, DatasetDict, Dataset, BuilderConfig, GeneratorBasedBuilder
import pandas as pd
from tqdm import tqdm
import os
def format_line(raw_line):
split_line = raw_line.split("\\n")
max_length = 8
if len(split_line) < max_length:
@Kyle1668
Kyle1668 / calulcate_perplexity.py
Last active March 24, 2023 01:43
Calculate the individual perplexities for each sequence in a batch. In this case, I'm using a `GPTNeoXForCausalLM` for inference.
def get_model_perplexities(split_name):
memories = load_dataset("EleutherAI/pythia-memorized-evals")[split_name]
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
tokenizer = load_tokenizer(split_name)
pythia_model = load_model(split_name)
memories_dataset = HFMemoriesDataset(memories, tokenizer)
data_loader = DataLoader(memories_dataset, batch_size=128)
all_perplexities = []
with torch.no_grad():
bierner.markdown-preview-github-styles
bungcip.better-toml
ckolkman.vscode-postgres
DavidAnson.vscode-markdownlint
dbaeumer.vscode-eslint
Equinusocio.vsc-material-theme
esbenp.prettier-vscode
karyfoundation.comment
mauve.terraform
ms-mssql.mssql
{
"workbench.colorTheme": "Material Theme",
"workbench.colorCustomizations": {
"terminal.foreground": "#C3FF99"
},
"files.autoSave": "afterDelay",
"editor.formatOnSave": true,
"javascript.format.enable": false,
"eslint.autoFixOnSave": true,
"eslint.alwaysShowStatus": true,
import requests
import json
import os
headers = {
"developerkey": os.environ["REMOTEIT_DEVELOPER_KEY"],
"token": os.environ["REMOTEIT_TOKEN"]
}
body = {
"deviceaddress": "80:00:00:3F:AE:00:00:11"
import requests
import json
import os
headers = {
"developerkey": os.environ["REMOTEIT_DEVELOPER_KEY"]
}
body = {
"password": os.environ["REMOTEIT_PASSWORD"],
"username": os.environ["REMOTEIT_USERNAME"]
import requests
import os
headers = {
"developerkey": os.environ["REMOTEIT_DEVELOPER_KEY"],
"token": os.environ["REMOTEIT_TOKEN"]
}
url = "https://api.remot3.it/apv/v27/device/list/all"
@Kyle1668
Kyle1668 / connection.cs
Created January 11, 2019 20:24
connection.cs
using System;
using System.Net.Http;
using Newtonsoft.Json;
using System.Collections.Generic;
namespace remote.it_api_example
{
class Program
{
static void Main(string[] args)