Skip to content

Instantly share code, notes, and snippets.

"""Query AlchemyAPI to determine number of API calls still available"""
# -*- coding: utf-8 -*-
import json
import requests
def get_api_key():
# Load API key (40 HEX character key) from local file
key = open('api_key.txt').readline().strip()
return key
@alvations
alvations / nltk-intro.py
Created October 1, 2015 12:58 — forked from alexbowe/nltk-intro.py
Demonstration of extracting key phrases with NLTK in Python
import nltk
text = """The Buddha, the Godhead, resides quite as comfortably in the circuits of a digital
computer or the gears of a cycle transmission as he does at the top of a mountain
or in the petals of a flower. To think otherwise is to demean the Buddha...which is
to demean oneself."""
# Used when tokenizing words
sentence_re = r'''(?x) # set flag to allow verbose regexps
([A-Z])(\.[A-Z])+\.? # abbreviations, e.g. U.S.A.
@alvations
alvations / bulba-parser.rb
Created July 3, 2016 16:04 — forked from meew0/bulba-parser.rb
Ruby script to parse a dump of Bulbapedia's Pokémon pages into obtainability data
# This script parses a dump of Bulbapedia's Pokémon pages into a JSON file
# with details about what Pokémon are obtainable in respective regions
# (specifically, the latest series of games set in a specific region).
require 'nokogiri'
require 'json'
# An XML dump of all of Bulbapedia's Pokémon pages is required to exist at
# this path. It can be generated using this special page:
# http://bulbapedia.bulbagarden.net/wiki/Special:Export
@alvations
alvations / google_twunter_lol
Created February 21, 2017 06:51 — forked from jamiew/google_twunter_lol
All the dirty words from Google's "what do you love" project: http://www.wdyl.com/
easterEgg.BadWorder.list={
"4r5e":1,
"5h1t":1,
"5hit":1,
a55:1,
anal:1,
anus:1,
ar5e:1,
arrse:1,
arse:1,
@alvations
alvations / docx2md.md
Created May 4, 2017 09:07 — forked from vdavez/docx2md.md
Convert a Word Document into MD

Converting a Word Document to Markdown in Two Moves

The Problem

A lot of important government documents are created and saved in Microsoft Word (*.docx). But Microsoft Word is a proprietary format, and it's not really useful for presenting documents on the web. So, I wanted to find a way to convert a .docx file into markdown.

The Solution

As it turns out, there are several open-source tools that allow for conversion between file types. Pandoc is one of them, and it's powerful. In fact, pandoc's website says "If you need to convert files from one markup format into another, pandoc is your swiss-army knife." But, although pandoc can convert from markdown into .docx, it doesn't work in the other direction.

@alvations
alvations / colors.py
Created July 13, 2017 01:10 — forked from sheljohn/colours-old.py
Print with colors in most shells (Python, standalone)
class ColorPrinter:
"""
Usage:
cprint = ColorPrinter()
cprint.cfg('c','m','bux').out('Hello','World!')
cprint.rst().out('Bye now...')
See: http://stackoverflow.com/a/21786287/472610
See: https://en.wikipedia.org/wiki/ANSI_escape_code
"""
@alvations
alvations / colors.py
Created July 13, 2017 01:10 — forked from sheljohn/colours-old.py
Print with colors in most shells (Python, standalone)
class ColorPrinter:
"""
Usage:
cprint = ColorPrinter()
cprint.cfg('c','m','bux').out('Hello','World!')
cprint.rst().out('Bye now...')
See: http://stackoverflow.com/a/21786287/472610
See: https://en.wikipedia.org/wiki/ANSI_escape_code
"""
@alvations
alvations / mini_sequence_labeler.py
Created August 22, 2017 00:06 — forked from hal3/mini_sequence_labeler.py
PyTorch implementation of a sequence labeler (POS taggger).
"""
PyTorch implementation of a sequence labeler (POS taggger).
Basic architecture:
- take words
- run though bidirectional GRU
- predict labels one word at a time (left to right), using a recurrent neural network "decoder"
The decoder updates hidden state based on:
- most recent word
@alvations
alvations / dynet-tagger.py
Created August 27, 2017 00:28 — forked from neubig/dynet-tagger.py
A small sequence labeler in DyNet
"""
DyNet implementation of a sequence labeler (POS taggger).
This is a translation of this tagger in PyTorch: https://gist.github.com/hal3/8c170c4400576eb8d0a8bd94ab231232
Basic architecture:
- take words
- run though bidirectional GRU
- predict labels one word at a time (left to right), using a recurrent neural network "decoder"
The decoder updates hidden state based on:
- most recent word
@alvations
alvations / mean_target_encoding.py
Created September 29, 2017 11:49 — forked from ogrisel/mean_target_encoding.py
Mean target value encoding for categorical variable using dask
import os
import os.path as op
from time import time
import dask.dataframe as ddf
import dask.array as da
from dask import delayed, compute
from distributed import Client
def make_categorical_data(n_samples=int(1e7), n_features=10):