Skip to content

Instantly share code, notes, and snippets.


Joseph Szymborski jszym

View GitHub Profile
jszym /
Created Jun 4, 2020
Given a class=folder structure, compute splits with sklearn
# a library for discovering paths
from glob import glob
from sklearn.model_selection import train_test_split
# you may need to look up the documentation for glob
# "*" is a stand=in for any string
# this assumes that the subfolders are in the same folder as the script
# if the subfolders were in a folder "data", the argument to glob would be
# "./data/*.png"
paths = glob("./*/*.png")
jszym /
Created Sep 2, 2019
A quick script to get rid of Google (UTM) tracking, as well as the tracking query strings on NYTimes URLs.
from urllib.parse import parse_qs, urlparse, urlencode, urlunparse
import copy
def clean_trackers_url(url):
url_obj = urlparse(url)
raw_query = parse_qs(url_obj.query)
clean_query = copy.deepcopy(raw_query)
# add query keys to ban (exact matches)

Keybase proof

I hereby claim:

  • I am jszym on github.
  • I am jszym ( on keybase.
  • I have a public key whose fingerprint is 9961 76AC EF9F 41DA EF59 8151 AAFD DADA 459E F326

To claim this, I am signing this object:

jszym / validate_url.js
Created Jan 13, 2019
Function to validate URLs
View validate_url.js
Requires punycode.js found at
to handle UTF-8. Has a very high true-positive rate,
and low false-positive rate on this test-suite
function validate_url(link){