Skip to content

Instantly share code, notes, and snippets.

Nick Doiron mapmeld

Block or report user

Report or block mapmeld

Hide content and notifications from this user.

Learn more about blocking users

Contact Support about this user’s behavior.

Learn more about reporting abuse

Report abuse
View GitHub Profile
@mapmeld
mapmeld / download_glue_data.py
Last active Mar 25, 2020 — forked from W4ngatang/download_glue_data.py
Script for downloading data of the GLUE benchmark (gluebenchmark.com)
View download_glue_data.py
''' Script for downloading all GLUE data.
Note: for legal reasons, we are unable to host MRPC.
You can either use the version hosted by the SentEval team, which is already tokenized,
or you can download the original data from (https://download.microsoft.com/download/D/4/6/D46FF87A-F6B9-4252-AA8B-3604ED519838/MSRParaphraseCorpus.msi) and extract the data from it manually.
For Windows users, you can run the .msi file. For Mac and Linux users, consider an external library such as 'cabextract' (see below for an example).
You should then rename and place specific files in a folder (see below for an example).
mkdir MRPC
cabextract MSRParaphraseCorpus.msi -d MRPC
@mapmeld
mapmeld / issues.md
Last active Feb 28, 2020
Nevada delegate issues
View issues.md

Assuming the final delegate counts and viability number are correct

Unusual

  • Carson City 107: extra delegate, Biden's 2nd
  • Carson City 407: delegate should have been added to Biden, not Klobuchar
  • Clark 1621: needs to add 1 leftover delegate each to Buttigieg and Sanders
  • Clark 1642: unclear, assigned too many delegates instead of a +1 to Sanders
  • Clark 1643: removed Klobuchar's 1 delegate to match expected delegates, even though viable; all had 1 delegate
  • Clark 1645: removed Warren's 1 delegate though viable
@mapmeld
mapmeld / calc_districtr_plans.py
Created Jan 8, 2020
Count number of saved plans
View calc_districtr_plans.py
# calculate number of plans, by state
import json
plans = open('districtr_full_export.json', 'r').read().strip().split("\n")
places = {}
for raw in plans:
plan = json.loads(raw)
if ("plan" in plan) and ("placeId" in plan["plan"]):
place = plan["plan"]["placeId"]
if place in places:
@mapmeld
mapmeld / 1draft.py
Last active Jan 5, 2020
first-draft qa
View 1draft.py
from allennlp.predictors import Predictor
from transformers.tokenization_gpt2 import GPT2Tokenizer
from transformers import pipeline
class HuggingFacePredictor(Predictor):
def __init__(self) -> None:
self.tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
self.model = pipeline('question-answering')
def predict(self, passage='', question=''):
View qa.py
from allennlp.predictors.predictor import Predictor
predictor = Predictor.from_path("https://storage.googleapis.com/allennlp-public-models/bidaf-elmo-model-2018.11.30-charpad.tar.gz")
qas = open("simplified-nq-test.jsonl").read().split("\n")
for qa in qas:
rep = json.loads(qa)
best = rep['long_answer_candidates'][0]
print(rep['question_text'])
print('AllenNLP: ')
print(predictor.predict(
@mapmeld
mapmeld / state_specific.py
Created Jan 2, 2020
State-specific maps of Native American Communities
View state_specific.py
from sys import argv
import json
# pip install fiona shapely shapely-geojson
import fiona
from shapely.geometry import shape
from shapely_geojson import dumps
if len(argv) < 2:
print('usage: gen_map.py "New Mexico" > output.geojson')
@mapmeld
mapmeld / 2020_ml.md
Last active Dec 30, 2019
2020_ml_problems.md
View 2020_ml.md

The number of awesome ML projects is limitless, but:

This lists project ideas which I grouped together as awesome and seemingly achievable:

Open-ended Datasets

View mentionsum.py
import pandas as pd
for lang in ['ar', 'en', 'ru', 'ja', 'tr', 'fa']:
mentionsum = {}
for doc in range(1, 10): # ends at 9
print(doc)
df = pd.read_csv("saudi_arabia_112019_tweets_csv_hashed_" + str(doc) + ".csv")
rows = df[df['tweet_language'] == lang][['user_mentions']].values.tolist()
df = None # clear memory
for row in rows:
mentions = row[0].replace('[','').replace(']','').replace('\'','').split(', ')
@mapmeld
mapmeld / langsum.py
Last active Dec 29, 2019
LangSum.py
View langsum.py
import pandas as pd
dflangsum = None
for doc in range(1, 10): # ends at 9
df = pd.read_csv("saudi_arabia_112019_tweets_csv_hashed_" + str(doc) + ".csv")
langcount = df[df['is_retweet'] == False].groupby(['tweet_language']).count()['tweetid']
if dflangsum is not None:
dflangsum += langcount
else:
dflangsum = langcount
df = None # memory
View face_classifier.py
"""
# BASH dependencies
apt-get install python-opencv ffmpeg
pip install keras numpy shap matplotlib pillow
rm ./drive/My\ Drive/mlin/training/*/*.jpg
rm ./drive/My\ Drive/mlin/validation/*/*.jpg
"""
# native imports
You can’t perform that action at this time.