Skip to content

Instantly share code, notes, and snippets.

View mkemp's full-sized avatar

Matthew Kemp mkemp

View GitHub Profile
@mkemp
mkemp / Spark Shell
Created May 11, 2015 15:05
Used in CloudCamp Chicago 2015.05.11 presentation.
$ pyspark
Python 2.7.2 (default, Oct 11 2012, 20:14:37)
[GCC 4.2.1 Compatible Apple Clang 4.0 (tags/Apple/clang-418.0.60)] on darwin
...
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 1.1.0
/_/
@mkemp
mkemp / word_count.py
Created May 11, 2015 15:01
Used in CloudCamp Chicago 2015.05.11 presentation.
#!/usr/bin/env python
import re
import string
regex = re.compile('[%s]' % re.escape(string.punctuation))
def word_count(sc, in_file_name, out_file_name):
sc.textFile(in_file_name) \
.flatMap(lambda line: [(word, 1) for word in regex.sub(' ', line).strip().lower().split(' ') if word]) \
@mkemp
mkemp / word_count.sh
Created May 11, 2015 14:57
Used in CloudCamp Chicago 2015.05.11 presentation.
#!/bin/bash
text=$(cat ${1} | tr '[:punct:]' ' ' | tr '[:upper:]' '[:lower:]')
parsed=(${text})
for w in ${parsed[@]}; do echo ${w}; done | sort | uniq -c
@mkemp
mkemp / Sample Text
Last active August 29, 2015 14:20
Used in CloudCamp Chicago 2015.05.11 presentation.
One morning, when Gregor Samsa woke from troubled dreams, he found himself
transformed in his bed into a horrible vermin. He lay on his armour-like back,
and if he lifted his head a little he could see his brown belly, slightly
domed and divided by arches into stiff sections. The bedding was hardly able
to cover it and seemed ready to slide off any moment. His many legs, pitifully
thin compared with the size of the rest of him, waved about helplessly as he
looked. "What's happened to me?" he thought. It wasn't a dream. His room, a
proper human room although a little too small, lay peacefully between its four
familiar walls. A collection of textile samples lay spread out on the table -
Samsa was a travelling salesman - and above it there hung a picture that he had