Skip to content

Instantly share code, notes, and snippets.

View amontalenti's full-sized avatar

Andrew Montalenti amontalenti

View GitHub Profile
@amontalenti
amontalenti / livereload-server.py
Created February 10, 2014 19:34
Example server using livereload 2.0, Flask, and formic to monitor filesystem changes, re-build files, and run a simple static web server
#!/usr/bin/env python
#
# simple static Flask fileserving app
# with livereload integration
#
from flask import Flask
STATIC_FOLDER = "."

Lucene Fundamentals

A useful set of Lucene fundamentals that are good for grok'ing Elasticsearch.

Jargon Glossary

  • document: a record; the unit of search; the thing returned as search results
  • field: a typed slot in a document for storing and indexing values
  • index: a collection of documents, typically with the same field mappings or schema
  • corpus: the entire set of documents in an index
@amontalenti
amontalenti / script-inject-http-proxy.js
Created November 21, 2012 17:16
script injecting proxy for Node.JS
var httpProxy = require('http-proxy');
var url = require('url');
httpProxy.createServer(function(req, res, proxy) {
var isHtml = false,
write = res.write,
writeHead = res.writeHead,
params = url.parse(req.url, true).query,
dest = params.dest || 'localhost',
-- XXX: all of this is a bad idea, but it was a nice idea at the time :)
CREATE TABLE IF NOT EXISTS apikey_changed_urls (
process_minute timestamp, -- current minute of processing
apikey text, -- apikey
url text, -- url where data changed
change_time timestamp, -- 5-min period where data changed
process_hour timestamp, -- current hour of processing
process_day timestamp, -- current day of processing
PRIMARY KEY (process_minute, apikey, url, change_time));
>>> import re
>>> eml = re.compile(r"([^@|\s]+@[^@]+\.[^@|\s]+)")
>>> match = eml.search("some text that has a test@test.com email address")
>>> match.group(1)
"test@test.com"
#!/bin/sh
# on Ubuntu 14.04, set a pm-hibernate resume hook
# which is placed in /etc/pm/sleep.d/00_intel_pstate
# it sets up CPU intel pstate appropriately, which by default
# gets ruined by being scaled down to 50% of max CPU without the
# easy ability to change it back; also sets governor to performance
# for good measure, since apparently default is "powersave"
@amontalenti
amontalenti / sitemap_spider.py
Last active October 6, 2021 15:44
Simple script that uses BeautifulSoup, requests, and urlparse to spider a sitemap.xml file (CNN used as example)
import os
import requests
from BeautifulSoup import BeautifulSoup
from urlparse import urlparse
sitemap_xml = "http://www.cnn.com/sitemaps/sitemap-specials-2013-11.xml"
sitemap_response = requests.get(sitemap_xml)
soup = BeautifulSoup(sitemap_response.content)
@amontalenti
amontalenti / bigram_freq.py
Created December 15, 2013 16:57
example of using nltk to get bigram frequencies
>>> from nltk import word_tokenize
>>> from nltk.collocations import BigramCollocationFinder
>>> text = "obama says that obama says that the war is happening"
>>> finder = BigramCollocationFinder.from_words(word_tokenize(text))
>>> finder.items()[0:5]
[(('obama', 'says'), 2),
(('says', 'that'), 2),
(('is', 'happening'), 1),
(('that', 'obama'), 1),
(('that', 'the'), 1)]
from itertools import chain, product
from re import match, findall
GRAMMAR = '''
<sentence> ::= <noun phrase=""> <verb phrase="">
<noun> ::= "boy " | "troll " | "moon " | "telescope "
<transitive verb=""> ::= "hits " | "sees "
<intransitive verb=""> ::= "runs " | "sleeps "
<adjective> ::= "big " | "red "
<adverb> ::= "quickly " | "quietly "
@amontalenti
amontalenti / install.sh
Last active March 6, 2018 20:56
pyenv Python 2.7 install to match Ubuntu server settings
export PYTHON_CONFIGURE_OPTS="--enable-ipv6\
--enable-unicode=ucs4\
--with-dbmliborder=bdb:gdbm\
--with-system-expat\
--with-system-ffi\
--with-fpectl"
pyenv install -f 2.7.14