Skip to content

Instantly share code, notes, and snippets.

View jszym's full-sized avatar

Joseph Szymborski jszym

View GitHub Profile
@jszym
jszym / stream_fasta.py
Last active July 28, 2024 21:53
Streaming FASTA file
"""
Copyright (C) 2024 by Joseph Szymborski (jszym.com)
Permission to use, copy, modify, and/or distribute this software for
any purpose with or without fee is hereby granted.
THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS
ALL WARRANTIES WITH REGARD TO THIS SOFTWARE INCLUDING ALL
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO
EVENT SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT,
@jszym
jszym / namespaced_token.py
Created June 28, 2023 17:57
Namespaced, random token.
from base64 import urlsafe_b64encode
from blake3 import blake3
import krock32
from time import time
import secrets
import math
def generate_token(namespace: str) -> str:
"""
@jszym
jszym / quickdraw.py
Created February 20, 2023 16:34
PyTorch QuickDraw dataset
# adapted from https://github.com/nateraw/quickdraw-pytorch/blob/main/quickdraw.ipynb
from typing import List, Optional
import urllib.request
from tqdm.auto import tqdm
from pathlib import Path
import requests
import torch
import math
import numpy as np
@jszym
jszym / dictlogger.py
Created February 13, 2023 00:46
Dictionary Logger for PyTorch Lightning
from pytorch_lightning.utilities import rank_zero_only
from pytorch_lightning.loggers import Logger
from pytorch_lightning.loggers.logger import rank_zero_experiment
from collections import defaultdict
class DictLogger(Logger):
def __init__(self):
super().__init__()
@jszym
jszym / split.py
Created June 4, 2020 17:38
Given a class=folder structure, compute splits with sklearn
# a library for discovering paths
from glob import glob
from sklearn.model_selection import train_test_split
# you may need to look up the documentation for glob
# "*" is a stand=in for any string
# this assumes that the subfolders are in the same folder as the script
# if the subfolders were in a folder "data", the argument to glob would be
# "./data/*.png"
paths = glob("./*/*.png")
@jszym
jszym / clean_trackers_url.py
Created September 2, 2019 01:20
A quick script to get rid of Google (UTM) tracking, as well as the tracking query strings on NYTimes URLs.
from urllib.parse import parse_qs, urlparse, urlencode, urlunparse
import copy
def clean_trackers_url(url):
url_obj = urlparse(url)
raw_query = parse_qs(url_obj.query)
clean_query = copy.deepcopy(raw_query)
# add query keys to ban (exact matches)

Keybase proof

I hereby claim:

  • I am jszym on github.
  • I am jszym (https://keybase.io/jszym) on keybase.
  • I have a public key whose fingerprint is 9961 76AC EF9F 41DA EF59 8151 AAFD DADA 459E F326

To claim this, I am signing this object:

@jszym
jszym / validate_url.js
Created January 13, 2019 03:41
Function to validate URLs
/**
VALIDATE URL
------------------------------------------------------
Requires punycode.js found at https://mths.be/punycode
to handle UTF-8. Has a very high true-positive rate,
and low false-positive rate on this test-suite
https://mathiasbynens.be/demo/url-regex
**/
function validate_url(link){