Skip to content

Instantly share code, notes, and snippets.

View gavinmh's full-sized avatar

Gavin Hackeling gavinmh

View GitHub Profile
@gavinmh
gavinmh / example.ipynb
Created August 14, 2020 16:31
example.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@gavinmh
gavinmh / viterbi.py
Created November 19, 2012 04:27
Viterbi Algorithm
# -*- coding: utf-8 -*-
"""
This is an example of a basic optical character recognition system.
Some components, such as the featurizer, are missing, and have been replaced
with data that I made up.
This system recognizes words produced from an alphabet of 2 letters: 'l' and 'o'.
Words that can be recognized include, 'lol', 'lolol', 'and loooooll'.
We'll assume that this system is used to digitize hand-written notes by Redditors,
or something.
@gavinmh
gavinmh / rcv1-topics.txt
Created August 16, 2013 21:45
RCV1 Topics
CCAT: CORPORATE/INDUSTRIAL
C11: STRATEGY/PLANS
C12: LEGAL/JUDICIAL
C13: REGULATION/POLICY
C14: SHARE LISTINGS
C15: PERFORMANCE
C151: ACCOUNTS/EARNINGS
C1511 child-description: ANNUAL RESULTS
C152: COMMENT/FORECASTS
C16: INSOLVENCY/LIQUIDITY
@gavinmh
gavinmh / cassandra-notes.md
Last active July 23, 2020 20:52
Cassandra Notes

Cassandra Notes

Introduction

Apache Cassandra is an open source, distributed database management system. Cassandra is designed to handle large amounts of data across many commodity servers. Cassandra uses a query language named CQL.

Cassandra's data model is a partitioned row store; Cassandra combines elements of key-value stores and tabular/columnar databases. Like a relational database, Cassandra stores data in tables, called column families, that have defined columns and associated data types. Each row in a column family is uniquely identified by a key. Each row has multiple columns, each of which has a timestamp, name, and value. Unlike a relational database, each row in a column family does not need to have the same set of columns. At any time, a column may be added to one or more rows. If this explanation is unclear, you might think of column families instead as sets of key-value pairs, in which the values are nested sets of key-value pairs.

The following depicts two rows of a column-family fro

This file has been truncated, but you can view the full file.
<html><head><meta charset="utf-8" /></head><body><script type="text/javascript">window.PlotlyConfig = {MathJaxConfig: 'local'};</script><script type="text/javascript">/**
* plotly.js v1.43.1
* Copyright 2012-2018, Plotly, Inc.
* All rights reserved.
* Licensed under the MIT license
*/
!function(t){if("object"==typeof exports&&"undefined"!=typeof module)module.exports=t();else if("function"==typeof define&&define.amd)define([],t);else{("undefined"!=typeof window?window:"undefined"!=typeof global?global:"undefined"!=typeof self?self:this).Plotly=t()}}(function(){return function(){return function t(e,r,n){function i(o,s){if(!r[o]){if(!e[o]){var l="function"==typeof require&&require;if(!s&&l)return l(o,!0);if(a)return a(o,!0);var c=new Error("Cannot find module '"+o+"'");throw c.code="MODULE_NOT_FOUND",c}var u=r[o]={exports:{}};e[o][0].call(u.exports,function(t){return i(e[o][1][t]||t)},u,u.exports,t,e,r,n)}return r[o].exports}for(var a="function"==typeof require&&require,o=0;o<n.length;o++)i(n[o]);return i}}()({
@gavinmh
gavinmh / ner.py
Last active December 5, 2018 18:59
Named Entity Extraction with NLTK in Python
# -*- coding: utf-8 -*-
'''
'''
from nltk import sent_tokenize, word_tokenize, pos_tag, ne_chunk
def extract_entities(text):
entities = []
for sentence in sent_tokenize(text):
@gavinmh
gavinmh / socks_proxy.sh
Created February 23, 2018 01:42
SOCKS Proxy
ssh -D 1337 -f -C -q -N user@remote -p 22
@gavinmh
gavinmh / gist:6834934
Created October 5, 2013 00:16
Bayes' Theorem By/For Idiots
# Bayes' Theorem
Let's pretend that you wish to find the probability that two events, A and B, occur.
If A and B are independent events, then probability that A and B both occur is
P(A)P(B).
However, A and B might be related events. If they are not independent, the probability that A and B both occur is
P(A)P(B|A)
@gavinmh
gavinmh / psql.md
Created September 5, 2017 02:47 — forked from cimmanon/psql.md
PostgreSQL cheat sheet for MySQL users

I use PostgreSQL via the psql client. If you use a different client (eg. pgAdmin, etc.), I don't know how much will translate over.

One nice difference between psql and mysql (cli) is that if you press CTRL+C, it won't exit the client.

User administration

Login as superuser (via shell)

psql -U postgres
@gavinmh
gavinmh / pil_to_numpy.py
Created July 18, 2017 00:23
PIL to NumPy to PIL
import numpy
import PIL
# Convert Image to array
img = PIL.Image.open("foo.jpg").convert("L")
arr = numpy.array(img)
# Convert array to Image
img = PIL.Image.fromarray(arr)