Skip to content

Instantly share code, notes, and snippets.

@allenday
allenday / canopy.py
Created February 5, 2024 05:55 — forked from gdbassett/canopy.py
Efficient python implementation of canopy clustering. (A method for efficiently generating centroids and clusters, most commonly as input to a more robust clustering algorithm.)
from sklearn.metrics.pairwise import pairwise_distances
import numpy as np
# X shoudl be a numpy matrix, very likely sparse matrix: http://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.sparse.csr_matrix.html#scipy.sparse.csr_matrix
# T1 > T2 for overlapping clusters
# T1 = Distance to centroid point to not include in other clusters
# T2 = Distance to centroid point to include in cluster
# T1 > T2 for overlapping clusters
# T1 < T2 will have points which reside in no clusters
# T1 == T2 will cause all points to reside in mutually exclusive clusters
@allenday
allenday / bq-geohash-udf.sql
Created July 2, 2021 02:59 — forked from vladaman/bq-geohash-udf.sql
Google BigQuery standardSQL UDF function to decode geohash string into bounding box and coordinates
#standardSQL
CREATE TEMPORARY FUNCTION geohashDecode(geohash STRING)
RETURNS STRUCT<bbox ARRAY<FLOAT64>, lat FLOAT64, lng FLOAT64>
LANGUAGE js AS """
if (!geohash) return null;
var base32 = '0123456789bcdefghjkmnpqrstuvwxyz';
geohash = geohash.toLowerCase();
var evenBit = true;
var latMin = -90, latMax = 90;
@allenday
allenday / tensorflow_json_parser.py
Last active January 2, 2020 09:56 — forked from sameerg07/tensorflow_json_parser.py
converts the json file downloaded using image classifer tool of dataturks to dataset folder
#This script has been solely created under dataturks. Copyrights are reserved
#EXAMPLE USAGE
#python3 tensorflow_json_parser.py --json_file "flower.json" --dataset_path "Dataset5/"
import json
import glob
import urllib.request
@allenday
allenday / .block
Last active January 25, 2019 09:10 — forked from mbostock/.block
Force Layout from CSV
license: gpl-3.0