Skip to content

Instantly share code, notes, and snippets.

View tommct's full-sized avatar

Tom McTavish tommct

View GitHub Profile
@tommct
tommct / README.md
Last active September 11, 2016 17:55
Agglomerative Filtering Recipe for Python Sklearn using similarity matrix

This is a recipe for using Sklearn to build a cosine similarity matrix and then to build dendrograms from it.

import numpy as np
import matplotlib.pyplot as plt
import scipy.cluster.hierarchy
import scipy.spatial.distance
from scipy.spatial.distance import pdist
from sklearn.metrics.pairwise import cosine_similarity

Make a "feature matrix" of 15 items that will be the binary representation of each index.

@tommct
tommct / README.md
Last active March 15, 2021 17:39
Tableau Box Plots and Histograms

This is a recipe for making box plots overlaying histograms in Tableau version 9.3. It largely borrows from http://vizpainter.com/some-tableau-tips-options-for-box-and-whisker/ and http://vizdiff.blogspot.com/2015/11/overlaying-histogram-with-box-and.html.

  1. Create a fixed continuous variable for number of objects per dimension. For example, the number of unique assignments per user:

     [Assignments Per User] = {FIXED [Userid] : COUNTD([Assignmentid])}
    
  2. Set the variable's Default Aggregation to COUNT.

  3. Drag the variable from Measures to the columns shelf.

  4. Set it to "Dimension" instead of CNT().

@tommct
tommct / columnviamerge.py
Created July 20, 2017 18:23
Add columns to Pandas DataFrame by (left) merging with another.
def columns_via_merge(df: pd.DataFrame, df2: pd.DataFrame, oncols: list, assigning: list):
"""
Add (or replace) columns to df that map via a merge with df2.
Examples:
# Add the ord value to a subset of a DataFrame
ABC = [chr(x) for x in range(ord('A'), ord('Z') + 1)]
AABBCC = [chr(x)+chr(x) for x in range(ord('A'), ord('Z') + 1)]
abc = [chr(x) for x in range(ord('a'), ord('z') + 1)]
@tommct
tommct / README.md
Last active January 9, 2022 09:02
Instructions for downloading Jupyter Notebooks from Coursera

From an open Jupyter Notebook homework assignment, select "Coursera" to take you to the home page. Make a new notebook and fill it with the following and excute the cell with:

%%bash
tar cvfz hw.tar.gz .

This may take a little while to run depending on the packages. Select "Coursera" again to take you to the Home directory. Check the hw.tar.gz file and then Download. After the file is downloaded, delete it.

@tommct
tommct / README.md
Last active January 10, 2018 18:48
Matplotlib normalized histograms

This creates a normalized mass density histogram in matplotlib

bins = np.linspace(-1, 1, 101)
# To get a normalized mass density histogram, we have to do it this way...
hist, bins = np.histogram(df['some_column'], bins=bins, density=True)
hist /= len(bins)
width = bins[1]-bins[0]
fig = plt.figure(figsize=(8, 4))
ax = fig.add_axes([.15, .15, .75, .75])

plt.bar(left=bins[:-1], height=hist, width=width)

@tommct
tommct / README.md
Created August 28, 2018 23:02
MongoDB from Tableau

To get use MongoDB from Tableau, start a mongosqld instance...

mongosqld --mongo-uri "mongodb://<host>:<port>/?connect=direct"

Then from Tableau, select Servers->MongoDB BI Connector with 127.0.0.1 and 3307 as connection details.

@tommct
tommct / jupyterthemes.ipynb
Last active December 23, 2020 17:27
Jupyter Themes
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@tommct
tommct / walkcollection.py
Created April 30, 2021 16:39
Walk a collection, like a JSON object, using a callback.
import logging
from collections.abc import Iterable
def is_container(obj):
return isinstance(obj, Iterable) and not isinstance(obj, (str, bytes, bytearray))
# https://stackoverflow.com/a/54000999/394430
def walk_collection(obj, callback=None, _path: list=[], **kwargs):
"""Walk an arbitrarily nested structure of lists and/or dicts such as would be made when
reading JSON as an object. Walking is performed in a depth-first search manner.
@tommct
tommct / dijkstra.ipynb
Last active April 30, 2021 17:06
Generic Dijkstra's shortest paths implementation in Python using a priority queue with callback functionality as it visits nodes.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@tommct
tommct / MNIST_PCA.ipynb
Created April 30, 2021 23:30
PCA exploration in Python with the MNIST database
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.