Skip to content

Instantly share code, notes, and snippets.

View abehmiel's full-sized avatar

Abraham Hmiel abehmiel

View GitHub Profile
@abehmiel
abehmiel / fuzzy_join.py
Created October 31, 2017 19:10
Pandas fuzzy join
import difflib
# input data
df1 = DataFrame([[1],[2],[3],[4],[5]], index=['one','two','three','four','five'], columns=['number'])
df2 = DataFrame([['a'],['b'],['c'],['d'],['e']], index=['one','too','three','fours','five'], columns=['letter'])
# want to obtain:
# number letter
# one 1 a
# two 2 b
@abehmiel
abehmiel / botmakers-10-11.org
Last active October 18, 2017 20:32
Notes on Botmakers meetup

Taken at Babycastles in NYC 10/11 NYC Botmakers Meetup for more info: https://www.meetup.com/botmakers/ I make no claims as to the completeness of these notes

Conversational chatbots - Gautam

3 days of full development (idk what a bot is to v0.1)

chatbot client -> chatbot server -> conversation API

What is the weather in nyc? turn that into intent which the server can understand

@abehmiel
abehmiel / gist:d932a2b3028f836194db7cb3ffd49334
Created October 17, 2017 20:30 — forked from econchick/gist:4666413
Python implementation of Dijkstra's Algorithm
class Graph:
def __init__(self):
self.nodes = set()
self.edges = defaultdict(list)
self.distances = {}
def add_node(self, value):
self.nodes.add(value)
def add_edge(self, from_node, to_node, distance):
@abehmiel
abehmiel / nycc-tech-committee-algotransparency.org
Last active April 12, 2018 19:11
My notes of the NYCC Tech Committee meeting on the Algorithmic Transparency Bill, 16-96

These notes may have errors and omissions. I couldn’t get the names of a lot of the speakers and there are some places where I was thinking or distracted. I make no claims as to the completeness of this information

Algorithmic transparency legislation hearing 10/16/17

James Vaca, Chair of NYCC committee on technology

16-96 2017 Measures of transparency when NYC uses algorithms to impose penalties, police persons

  • Requires publication of source code and querying systems with sample data

  • If left unchecked, algorithms can have negative repercussions
  • Algorithms are a way of encoding assumptions
@abehmiel
abehmiel / keybase.md
Created October 11, 2017 17:07
Keybase proof

Keybase proof

I hereby claim:

  • I am abehmiel on github.
  • I am abehmiel (https://keybase.io/abehmiel) on keybase.
  • I have a public key whose fingerprint is 9268 F147 2D66 ED22 5564 4480 AB82 9B94 356E D366

To claim this, I am signing this object:

@abehmiel
abehmiel / sbuzz.py
Last active October 24, 2017 05:03
Buzzfeed article scraper for NLP
from bs4 import BeautifulSoup
import requests
# for cleaning:
import re
import string
import nltk
from itertools import chain
def scrape_buzzfeed_article(url):
@abehmiel
abehmiel / prepare_yr_library.py
Created August 12, 2017 01:50
Library for hackerrank challenges
""" GCD/ LCD """
# greatest common divisor
from fractions import gcd
gcd(x,y)
# least common multiple
def lcm(x, y):
""" This function takes two
integers and returns the L.C.M. """
@abehmiel
abehmiel / beautiful_idiomatic_python.md
Created August 11, 2017 21:44 — forked from JeffPaine/beautiful_idiomatic_python.md
Transforming Code into Beautiful, Idiomatic Python: notes from Raymond Hettinger's talk at pycon US 2013. The code examples and direct quotes are all from Raymond's talk. I've reproduced them here for my own edification and the hopes that others will find them as handy as I have!

Transforming Code into Beautiful, Idiomatic Python

Notes from Raymond Hettinger's talk at pycon US 2013 video, slides.

The code examples and direct quotes are all from Raymond's talk. I've reproduced them here for my own edification and the hopes that others will find them as handy as I have!

Looping over a range of numbers

for i in [0, 1, 2, 3, 4, 5]:
@abehmiel
abehmiel / test_prime.py
Last active August 10, 2017 17:23
Even better parameterized pytest boilerplate
# test_prime.py
# from James Routley's blog: https://jamesroutley.co.uk/tech/2017/08/09/parameterise-python-tests.html
import pytest
from prime import is_prime
@pytest.mark.parametrize("x,output", [
(-1, False),
(0, False),
@abehmiel
abehmiel / PRP_scrape.py
Last active August 12, 2017 16:57
Press release scrape for pressreleasepoint.com - verbose output to console and saves to text file
"""
Because this code take so long to run as-coded below, I recommended to follow it
up with a check for file duplicates (fdupes -dN in linux seems to work)
After downloading, you can combine them into a single corpus file by concatenating:
find . -name "*.txt" -exec cat '{}' ';' > dirty.txt
Then you can use whatever means you wish to clean up the text and remove unicode symbols
and so on
"""