Skip to content

Instantly share code, notes, and snippets.


Nick Doiron mapmeld

View GitHub Profile
mapmeld /
Last active Mar 25, 2020 — forked from W4ngatang/
Script for downloading data of the GLUE benchmark (
''' Script for downloading all GLUE data.
Note: for legal reasons, we are unable to host MRPC.
You can either use the version hosted by the SentEval team, which is already tokenized,
or you can download the original data from ( and extract the data from it manually.
For Windows users, you can run the .msi file. For Mac and Linux users, consider an external library such as 'cabextract' (see below for an example).
You should then rename and place specific files in a folder (see below for an example).
mkdir MRPC
cabextract MSRParaphraseCorpus.msi -d MRPC
mapmeld /
Last active Feb 28, 2020
Nevada delegate issues

Assuming the final delegate counts and viability number are correct


  • Carson City 107: extra delegate, Biden's 2nd
  • Carson City 407: delegate should have been added to Biden, not Klobuchar
  • Clark 1621: needs to add 1 leftover delegate each to Buttigieg and Sanders
  • Clark 1642: unclear, assigned too many delegates instead of a +1 to Sanders
  • Clark 1643: removed Klobuchar's 1 delegate to match expected delegates, even though viable; all had 1 delegate
  • Clark 1645: removed Warren's 1 delegate though viable
mapmeld /
Created Jan 8, 2020
Count number of saved plans
# calculate number of plans, by state
import json
plans = open('districtr_full_export.json', 'r').read().strip().split("\n")
places = {}
for raw in plans:
plan = json.loads(raw)
if ("plan" in plan) and ("placeId" in plan["plan"]):
place = plan["plan"]["placeId"]
if place in places:
mapmeld /
Last active Jan 5, 2020
first-draft qa
from allennlp.predictors import Predictor
from transformers.tokenization_gpt2 import GPT2Tokenizer
from transformers import pipeline
class HuggingFacePredictor(Predictor):
def __init__(self) -> None:
self.tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
self.model = pipeline('question-answering')
def predict(self, passage='', question=''):
from allennlp.predictors.predictor import Predictor
predictor = Predictor.from_path("")
qas = open("simplified-nq-test.jsonl").read().split("\n")
for qa in qas:
rep = json.loads(qa)
best = rep['long_answer_candidates'][0]
print('AllenNLP: ')
mapmeld /
Created Jan 2, 2020
State-specific maps of Native American Communities
from sys import argv
import json
# pip install fiona shapely shapely-geojson
import fiona
from shapely.geometry import shape
from shapely_geojson import dumps
if len(argv) < 2:
print('usage: "New Mexico" > output.geojson')
mapmeld /
Last active Dec 30, 2019

The number of awesome ML projects is limitless, but:

This lists project ideas which I grouped together as awesome and seemingly achievable:

Open-ended Datasets

import pandas as pd
for lang in ['ar', 'en', 'ru', 'ja', 'tr', 'fa']:
mentionsum = {}
for doc in range(1, 10): # ends at 9
df = pd.read_csv("saudi_arabia_112019_tweets_csv_hashed_" + str(doc) + ".csv")
rows = df[df['tweet_language'] == lang][['user_mentions']].values.tolist()
df = None # clear memory
for row in rows:
mentions = row[0].replace('[','').replace(']','').replace('\'','').split(', ')
mapmeld /
Last active Dec 29, 2019
import pandas as pd
dflangsum = None
for doc in range(1, 10): # ends at 9
df = pd.read_csv("saudi_arabia_112019_tweets_csv_hashed_" + str(doc) + ".csv")
langcount = df[df['is_retweet'] == False].groupby(['tweet_language']).count()['tweetid']
if dflangsum is not None:
dflangsum += langcount
dflangsum = langcount
df = None # memory
# BASH dependencies
apt-get install python-opencv ffmpeg
pip install keras numpy shap matplotlib pillow
rm ./drive/My\ Drive/mlin/training/*/*.jpg
rm ./drive/My\ Drive/mlin/validation/*/*.jpg
# native imports
You can’t perform that action at this time.