tommct /
Created Aug 28, 2018
MongoDB from Tableau

To get use MongoDB from Tableau, start a mongosqld instance...

mongosqld --mongo-uri "mongodb://<host>:<port>/?connect=direct"

Then from Tableau, select Servers->MongoDB BI Connector with and 3307 as connection details.

tommct /
Last active Jan 10, 2018
Matplotlib normalized histograms

This creates a normalized mass density histogram in matplotlib

bins = np.linspace(-1, 1, 101)
# To get a normalized mass density histogram, we have to do it this way...
hist, bins = np.histogram(df['some_column'], bins=bins, density=True)
hist /= len(bins)
width = bins[1]-bins[0]
fig = plt.figure(figsize=(8, 4))
ax = fig.add_axes([.15, .15, .75, .75])[:-1], height=hist, width=width)
tommct /
Last active Sep 27, 2020
Instructions for downloading Jupyter Notebooks from Coursera

From an open Jupyter Notebook homework assignment, select "Coursera" to take you to the home page. Make a new notebook and fill it with the following and excute the cell with:

tar cvfz hw.tar.gz .

This may take a little while to run depending on the packages. Select "Coursera" again to take you to the Home directory. Check the hw.tar.gz file and then Download. After the file is downloaded, delete it.

tommct /
Created Jul 20, 2017
Add columns to Pandas DataFrame by (left) merging with another.
def columns_via_merge(df: pd.DataFrame, df2: pd.DataFrame, oncols: list, assigning: list):
Add (or replace) columns to df that map via a merge with df2.
# Add the ord value to a subset of a DataFrame
ABC = [chr(x) for x in range(ord('A'), ord('Z') + 1)]
AABBCC = [chr(x)+chr(x) for x in range(ord('A'), ord('Z') + 1)]
abc = [chr(x) for x in range(ord('a'), ord('z') + 1)]
tommct /
Last active Aug 24, 2016
Tableau Box Plots and Histograms

This is a recipe for making box plots overlaying histograms in Tableau version 9.3. It largely borrows from and

  1. Create a fixed continuous variable for number of objects per dimension. For example, the number of unique assignments per user:

     [Assignments Per User] = {FIXED [Userid] : COUNTD([Assignmentid])}
  2. Set the variable's Default Aggregation to COUNT.

  3. Drag the variable from Measures to the columns shelf.

  4. Set it to "Dimension" instead of CNT().

tommct /
Last active Sep 11, 2016
Agglomerative Filtering Recipe for Python Sklearn using similarity matrix

This is a recipe for using Sklearn to build a cosine similarity matrix and then to build dendrograms from it.

import numpy as np
import matplotlib.pyplot as plt
import scipy.cluster.hierarchy
import scipy.spatial.distance
from scipy.spatial.distance import pdist
from sklearn.metrics.pairwise import cosine_similarity

# Make a "feature matrix" of 15 items that will be the binary representation of each index.
tommct /
Created Nov 6, 2015
Change Modification Date via Python

This is Python code for updating the file modification date of a file on MacOSX or Linux. In this example, I had copied .dv files from my camcorder, which encoded the date in the filename, but had as the modification date, the time I transferred the file from the camcorder.

import os
import time
fpath = '/path/to/dv/files'
for root, dirs, files in os.walk(fpath):
    for name in files:
        if name[-3:]=='.dv':
tommct /
Last active Aug 29, 2015
D3 Stacked Brush Plots

Implements multiple, stacked plots with brushing. This extends the example at and allows for multiple panels where each subsequent panel zooms from the previous. Data points are also smoothed, permitting data with over 100,000 points to have an overview with subsequent telescoping while maintaining context.

tommct /
Last active Jan 1, 2016
D3 Hierarchical Ordinal Ticks

This D3 example demonstrates constrained zooming, much like, but also illustrates the use of hierarchical ordinal tick marks. It does this by using the normalized values that one gets when using a hierarchical partition layout.

