Skip to content

Instantly share code, notes, and snippets.

@tyokota
tyokota / README.md
Last active May 2, 2016 05:47
2016 primary election

###### chart 1. 2016 Hawaii primary election results.

pairing national election data and maps is a hot mess.

Recently, I have come across a few choropleth maps using 2016 primary election data presenting relative proportions of voters by county for a single candidate. One example that stands out is a map of Bernie supporters in Alaska. The heavily colored map feels misleading when considering ~600 democrats participated. More people could fit in a Walmart on Black Friday.

Sometimes, a less sexy visualization tells an interesting story. My modest stacked bar graph shows the proportion of Republican and Democratic voters (portrays Hawaii’s huge imbalance towards the Democratic party) and the relative proportion of votes each candidate garnered.

Choropleth maps can be beautiful and effective, especially when making state-by-state comparisons. Other times, less

@tyokota
tyokota / README.md
Last active May 2, 2016 06:05
food radius

###### chart 1. my food radius.

food radius

“We purchase and consume 80% of our calories within 5 miles of our home” – Dr. Brian Wansink

The food radius map is the first exercise in Slim by Design: Mindless Eating Solutions for Everyday Life by Dr. Brian Wansink. By creating this map in R, I was able to understand my poorly made food choices.

Markers represent places that I regularly visit to eat; there are many other places not represented in this map. With that said, the food radius map still revealed just how inundated my neighborhood was with fast food. And if I were to think about the proportion of that 80% of caloric intake, it may be safe to say a majority of it comes from fast food places - yikes.

@tyokota
tyokota / src.R
Last active June 14, 2016 04:11
BRFSS
# thomasyokota[at]gmail.com
# project/purpose: CDC BRFSS in R for free.99
# DEPENDENCIES -----------------------------------------------------------------
install.packages('pacman')
pacman::p_load(RCurl, foreign, downloader, foreign)
# DATA -------------------------------------------------------------------------
source_url("https://raw.githubusercontent.com/ajdamico/asdfree/master/Download%20Cache/download%20cache.R", prompt=F, echo=F)
# download ez-pz brought to you by anthony joseph damico [ajdamico@gmail.com]
@tyokota
tyokota / index.html
Created July 16, 2016 08:26
Haversine Distance
This file has been truncated, but you can view the full file.
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<script src="data:application/x-javascript;base64,KGZ1bmN0aW9uKCkgewogIC8vIElmIHdpbmRvdy5IVE1MV2lkZ2V0cyBpcyBhbHJlYWR5IGRlZmluZWQsIHRoZW4gdXNlIGl0OyBvdGhlcndpc2UgY3JlYXRlIGEKICAvLyBuZXcgb2JqZWN0LiBUaGlzIGFsbG93cyBwcmVjZWRpbmcgY29kZSB0byBzZXQgb3B0aW9ucyB0aGF0IGFmZmVjdCB0aGUKICAvLyBpbml0aWFsaXphdGlvbiBwcm9jZXNzICh0aG91Z2ggbm9uZSBjdXJyZW50bHkgZXhpc3QpLgogIHdpbmRvdy5IVE1MV2lkZ2V0cyA9IHdpbmRvdy5IVE1MV2lkZ2V0cyB8fCB7fTsKCiAgLy8gU2VlIGlmIHdlJ3JlIHJ1bm5pbmcgaW4gYSB2aWV3ZXIgcGFuZS4gSWYgbm90LCB3ZSdyZSBpbiBhIHdlYiBicm93c2VyLgogIHZhciB2aWV3ZXJNb2RlID0gd2luZG93LkhUTUxXaWRnZXRzLnZpZXdlck1vZGUgPQogICAgICAvXGJ2aWV3ZXJfcGFuZT0xXGIvLnRlc3Qod2luZG93LmxvY2F0aW9uKTsKCiAgLy8gU2VlIGlmIHdlJ3JlIHJ1bm5pbmcgaW4gU2hpbnkgbW9kZS4gSWYgbm90LCBpdCdzIGEgc3RhdGljIGRvY3VtZW50LgogIC8vIE5vdGUgdGhhdCBzdGF0aWMgd2lkZ2V0cyBjYW4gYXBwZWFyIGluIGJvdGggU2hpbnkgYW5kIHN0YXRpYyBtb2RlcywgYnV0CiAgLy8gb2J2aW91c2x5LCBTaGlueSB3aWRnZXRzIGNhbiBvbmx5IGFwcGVhciBpbiBTaGlueSBhcHBzL2RvY3VtZW50cy4KICB2YXIgc2hpbnlNb2R
def load_embedding(embedding):
print(f'Loading {embedding} embedding..')
def get_coefs(word,*arr): return word, np.asarray(arr, dtype='float32')
if embedding == 'glove':
EMBEDDING_FILE = f'{FILE_DIR}/embeddings/glove.840B.300d/glove.840B.300d.txt'
embeddings_index = dict(get_coefs(*o.split(" ")) for o in open(EMBEDDING_FILE, encoding="utf8"))
elif embedding == 'wiki-news':
EMBEDDING_FILE = f'{FILE_DIR}/embeddings/wiki-news-300d-1M/wiki-news-300d-1M.vec'
embeddings_index = dict(get_coefs(*o.split(" ")) for o in open(EMBEDDING_FILE, encoding="utf8") if len(o)>100)
elif embedding == 'paragram':
text_vectorizer = TfidfVectorizer(
sublinear_tf=True,
strip_accents='unicode',
analyzer='word',
token_pattern=r'\w{1,}',
ngram_range=(1, 1),
max_features=30000)
text_vectorizer.fit(pd.concat([train['comment_text'], test['comment_text']]))
train_word_features = text_vectorizer.fit_transform(train['comment_text'])