Skip to content

Instantly share code, notes, and snippets.

View char_ngram.py
text_vectorizer = TfidfVectorizer(
sublinear_tf=True,
strip_accents='unicode',
analyzer='word',
token_pattern=r'\w{1,}',
ngram_range=(1, 1),
max_features=30000)
text_vectorizer.fit(pd.concat([train['comment_text'], test['comment_text']]))
train_word_features = text_vectorizer.fit_transform(train['comment_text'])
View concatenate_embeddings.py
def load_embedding(embedding):
print(f'Loading {embedding} embedding..')
def get_coefs(word,*arr): return word, np.asarray(arr, dtype='float32')
if embedding == 'glove':
EMBEDDING_FILE = f'{FILE_DIR}/embeddings/glove.840B.300d/glove.840B.300d.txt'
embeddings_index = dict(get_coefs(*o.split(" ")) for o in open(EMBEDDING_FILE, encoding="utf8"))
elif embedding == 'wiki-news':
EMBEDDING_FILE = f'{FILE_DIR}/embeddings/wiki-news-300d-1M/wiki-news-300d-1M.vec'
embeddings_index = dict(get_coefs(*o.split(" ")) for o in open(EMBEDDING_FILE, encoding="utf8") if len(o)>100)
elif embedding == 'paragram':
@tyokota
tyokota / index.html
Created Jul 16, 2016
Haversine Distance
View index.html
This file has been truncated, but you can view the full file.
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
View src.R
# thomasyokota[at]gmail.com
# project/purpose: CDC BRFSS in R for free.99
# DEPENDENCIES -----------------------------------------------------------------
install.packages('pacman')
pacman::p_load(RCurl, foreign, downloader, foreign)
# DATA -------------------------------------------------------------------------
source_url("https://raw.githubusercontent.com/ajdamico/asdfree/master/Download%20Cache/download%20cache.R", prompt=F, echo=F)
# download ez-pz brought to you by anthony joseph damico [ajdamico@gmail.com]
@tyokota
tyokota / README.md
Last active May 2, 2016
food radius
View README.md

###### chart 1. my food radius.

food radius

“We purchase and consume 80% of our calories within 5 miles of our home” – Dr. Brian Wansink

The food radius map is the first exercise in Slim by Design: Mindless Eating Solutions for Everyday Life by Dr. Brian Wansink. By creating this map in R, I was able to understand my poorly made food choices.

Markers represent places that I regularly visit to eat; there are many other places not represented in this map. With that said, the food radius map still revealed just how inundated my neighborhood was with fast food. And if I were to think about the proportion of that 80% of caloric intake, it may be safe to say a majority of it comes from fast food places - yikes.

@tyokota
tyokota / README.md
Last active May 2, 2016
2016 primary election
View README.md

###### chart 1. 2016 Hawaii primary election results.

pairing national election data and maps is a hot mess.

Recently, I have come across a few choropleth maps using 2016 primary election data presenting relative proportions of voters by county for a single candidate. One example that stands out is a map of Bernie supporters in Alaska. The heavily colored map feels misleading when considering ~600 democrats participated. More people could fit in a Walmart on Black Friday.

Sometimes, a less sexy visualization tells an interesting story. My modest stacked bar graph shows the proportion of Republican and Democratic voters (portrays Hawaii’s huge imbalance towards the Democratic party) and the relative proportion of votes each candidate garnered.

Choropleth maps can be beautiful and effective, especially when making state-by-state comparisons. Other times, less

You can’t perform that action at this time.