Skip to content

Instantly share code, notes, and snippets.

Avatar

Joseph Szymborski jszym

View GitHub Profile
@jszym
jszym / split.py
Created Jun 4, 2020
Given a class=folder structure, compute splits with sklearn
View split.py
# a library for discovering paths
from glob import glob
from sklearn.model_selection import train_test_split
# you may need to look up the documentation for glob
# "*" is a stand=in for any string
# this assumes that the subfolders are in the same folder as the script
# if the subfolders were in a folder "data", the argument to glob would be
# "./data/*.png"
paths = glob("./*/*.png")
@jszym
jszym / clean_trackers_url.py
Created Sep 2, 2019
A quick script to get rid of Google (UTM) tracking, as well as the tracking query strings on NYTimes URLs.
View clean_trackers_url.py
from urllib.parse import parse_qs, urlparse, urlencode, urlunparse
import copy
def clean_trackers_url(url):
url_obj = urlparse(url)
raw_query = parse_qs(url_obj.query)
clean_query = copy.deepcopy(raw_query)
# add query keys to ban (exact matches)
View keybase.md

Keybase proof

I hereby claim:

  • I am jszym on github.
  • I am jszym (https://keybase.io/jszym) on keybase.
  • I have a public key whose fingerprint is 9961 76AC EF9F 41DA EF59 8151 AAFD DADA 459E F326

To claim this, I am signing this object:

@jszym
jszym / validate_url.js
Created Jan 13, 2019
Function to validate URLs
View validate_url.js
/**
VALIDATE URL
------------------------------------------------------
Requires punycode.js found at https://mths.be/punycode
to handle UTF-8. Has a very high true-positive rate,
and low false-positive rate on this test-suite
https://mathiasbynens.be/demo/url-regex
**/
function validate_url(link){