Skip to content

Instantly share code, notes, and snippets.


Nick Doiron mapmeld

View GitHub Profile
mapmeld /
Last active Jan 4, 2021
Bangla Benchmark runs


Can these scores be improved? YES!

Rerunning with more training data, more epochs of training, or using other libraries to set a learning rate / other hyperparameters before training.

  • Experimenting with epochs - when I doubled the number of epochs, MuRIL improves only slightly (69.5->69.7 on one task)

The point of a benchmark is to run these models through a reasonable and identical process; you can tweak hyperparameters on any model to improve results.

mapmeld /
Last active Oct 22, 2020
mapmeld /
Last active Jul 16, 2020
from simpletransformers.classification import ClassificationModel
# set use_cuda=False on CPU-only platforms
model = ClassificationModel('bert', 'monsoon-nlp/dv-wave', num_labels=8, use_cuda=True, args={
'reprocess_input_data': True,
'use_cached_eval_features': False,
'overwrite_output_dir': True,
'num_train_epochs': 3,
'silent': True
mapmeld /
Created Jul 5, 2020
Add JSON block data to a shapefile with GDAL
# pip install gdal
import json
from osgeo import ogr
# depends on your shapefile
target_shapefile = 'tl_2010_sample_shapefile.shp'
fips_id = 'GEOID10'
saveblocks = json.loads(open('savefile.json', 'r').read())
mapmeld /
Last active Jul 8, 2020
Load 5-year ACS race + ethnicity data, ending in 2017
# pip install requests
import time, json
import requests
api_key = "API_KEY_STRING"
# look up FIPS for state and county:
state = '12'
county_fips = ['086']
mapmeld /
Last active May 13, 2020
References and links for Spanish counterfactuals
View AutoKeras_image_regression.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
mapmeld /
Created Apr 27, 2020
Adjusting to return raw boxes and classes for images
# -*- coding: utf-8 -*-
Class definition of YOLO_v3 style detection model on image and video
import colorsys
import os
from timeit import default_timer as timer
import numpy as np

Releasing Hindi ELECTRA model

This is a first attempt at a Hindi language model trained with Google Research's ELECTRA. I don't modify ELECTRA until we get into finetuning, and only then because there's hardcoded train and test files


Additional background:

It's available on HuggingFace: - sample usage:

mapmeld /
Last active Mar 25, 2020 — forked from W4ngatang/
Script for downloading data of the GLUE benchmark (
''' Script for downloading all GLUE data.
Note: for legal reasons, we are unable to host MRPC.
You can either use the version hosted by the SentEval team, which is already tokenized,
or you can download the original data from ( and extract the data from it manually.
For Windows users, you can run the .msi file. For Mac and Linux users, consider an external library such as 'cabextract' (see below for an example).
You should then rename and place specific files in a folder (see below for an example).
mkdir MRPC
cabextract MSRParaphraseCorpus.msi -d MRPC