Skip to content

Instantly share code, notes, and snippets.

💭
I may be slow to respond.

David Yerrington dyerrington

💭
I may be slow to respond.
Block or report user

Report or block dyerrington

Hide content and notifications from this user.

Learn more about blocking users

Contact Support about this user’s behavior.

Learn more about reporting abuse

Report abuse
View GitHub Profile
View parse_jupyter.md

Parse Jupyter

This is a basic class that makes it convenient to parse notebooks. I built a larger version of this that was used for clustering documents to create symantic indeices that linked related content together for a personal project. You can use this to parse notebooks for doing things like NLP or preprocessing.

Usage

parser = ParseJupyter("./Untitled.ipynb")
parser.get_cells(source_only = True, source_as_string = True)
@bgweber
bgweber / pandasUDF.py
Last active Oct 17, 2019
Distributing Feature Generation with Pandas UDFs
View pandasUDF.py
import featuretools as ft
from pyspark.sql.functions import pandas_udf, PandasUDFType
@pandas_udf(schema, PandasUDFType.GROUPED_MAP)
def apply_feature_generation(pandasInputDF):
# create Entity Set representation
es = ft.EntitySet(id="events")
es = es.entity_from_dataframe(entity_id="events", dataframe=pandasInputDF)
es = es.normalize_entity(base_entity_id="events", new_entity_id="users", index="user_id")
@dyerrington
dyerrington / readme.md
Last active Jan 9, 2019
This is a very basic data generator to test recommender systems. A future version may simulate the actual sparseness of ratings data with a simple bootstrap function but for now, numpy generator does the job.
View readme.md

RecData

To use this snippet, install faker:

pip install faker
@dyerrington
dyerrington / subplots.py
Created Mar 29, 2017
Plotting multiple figures with seaborn and matplotlib using subplots.
View subplots.py
##
# Create a figure space matrix consisting of 3 columns and 2 rows
#
# Here is a useful template to use for working with subplots.
#
##################################################################
fig, ax = plt.subplots(figsize=(10,5), ncols=3, nrows=2)
left = 0.125 # the left side of the subplots of the figure
right = 0.9 # the right side of the subplots of the figure
@meiamsome
meiamsome / hn_search.js
Last active Sep 7, 2019 — forked from kristopolous/hn_seach.js
hn job query search
View hn_search.js
/* Hacker News Search Script
*
* Original Script by Kristopolous:
* https://gist.github.com/kristopolous/19260ae54967c2219da8
*
* Usage:
* First, copy the script into your browser's console whilst on the Hacker News
* jobs page. Then, you can use the query function to filter the results.
*
* For example,
@marcelcaraciolo
marcelcaraciolo / spearman.py
Created Sep 12, 2011
spearman coefficient
View spearman.py
import datetime
import sys
import random
def _rank_dists(ranks1, ranks2):
"""Finds the difference between the values in ranks1 and ranks2 for keys
present in both dicts. If the arguments are not dicts, they are converted
from (key, rank) sequences.
"""
You can’t perform that action at this time.