Skip to content

Instantly share code, notes, and snippets.

View mapmeld's full-sized avatar

Nick Doiron mapmeld

  • Chicago, IL
View GitHub Profile
@mapmeld
mapmeld / add_data_task.py
Created July 9, 2021 12:40
Add text file task to T5
t5.data.TaskRegistry.add(
"byt5_ex",
t5.data.TextLineTask,
split_to_filepattern={
"train": "gs://BUCKET/train_lines.txt",
"validation": "gs://BUCKET/validation_lines.txt",
},
text_preprocessor=[
functools.partial(
t5.data.preprocessors.parse_tsv,
@mapmeld
mapmeld / CensusAPI.txt
Created August 2, 2012 23:01
Using the Census API
NOTE: This how-to was written for the Census API at http://thedataweb.rm.census.gov/ -- it has since been moved to http://api.census.gov/
Mike Stucka, our contact at the Macon Telegraph, sent us a link to the Census's official API which is launching next month. You can skip ahead to the site - http://www.census.gov/developers/ - and get an API key, but also read my notes after using this yesterday:
1) The datasets
--- The 2010 Census Summary comes from everyone filling out census forms, and you can get stats at state level down to a super-detailed block level. Info from this includes population, age, gender, race, home ownership, members of a household, and various combinations of that. Full list: http://www.census.gov/developers/data/sf1.xml
--- The 2006-2010 American Community Survey is a longer form given to fewer households over 5 years (so its numbers are incompatible with the 2010 Census). You can get stats down only to the block group level. In addition to the standard census stats, you get: educa
@mapmeld
mapmeld / bb.md
Last active January 4, 2021 16:01
Bangla Benchmark runs

Code: https://colab.research.google.com/drive/1vltPI81atzRvlALv4eCvEB0KdFoEaCOb?usp=sharing

Can these scores be improved? YES!

Rerunning with more training data, more epochs of training, or using other libraries to set a learning rate / other hyperparameters before training.

  • Experimenting with epochs - when I doubled the number of epochs, MuRIL improves only slightly (69.5->69.7 on one task)

The point of a benchmark is to run these models through a reasonable and identical process; you can tweak hyperparameters on any model to improve results.

@mapmeld
mapmeld / twiml-lightning-share.md
Last active October 22, 2020 15:38
twiml-lightning-share
@mapmeld
mapmeld / dv-wave.py
Last active July 16, 2020 18:29
PythonCode
from simpletransformers.classification import ClassificationModel
# set use_cuda=False on CPU-only platforms
model = ClassificationModel('bert', 'monsoon-nlp/dv-wave', num_labels=8, use_cuda=True, args={
'reprocess_input_data': True,
'use_cached_eval_features': False,
'overwrite_output_dir': True,
'num_train_epochs': 3,
'silent': True
})
@mapmeld
mapmeld / load_acs.py
Last active July 8, 2020 16:15
Load 5-year ACS race + ethnicity data, ending in 2017
# pip install requests
import time, json
import requests
api_key = "API_KEY_STRING"
# look up FIPS for state and county:
# https://www.nrcs.usda.gov/wps/portal/nrcs/detail/national/home/?cid=nrcs143_013697
state = '12'
county_fips = ['086']
@mapmeld
mapmeld / add_to_shapefile.py
Created July 5, 2020 23:10
Add JSON block data to a shapefile with GDAL
# pip install gdal
import json
from osgeo import ogr
# depends on your shapefile
target_shapefile = 'tl_2010_sample_shapefile.shp'
fips_id = 'GEOID10'
saveblocks = json.loads(open('savefile.json', 'r').read())
@mapmeld
mapmeld / links.md
Last active May 13, 2020 04:19
References and links for Spanish counterfactuals
@mapmeld
mapmeld / AutoKeras_image_regression.ipynb
Created April 28, 2020 21:22
AutoKeras Image Regression
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@mapmeld
mapmeld / yolo.py
Created April 27, 2020 05:16
Adjusting yolo.py to return raw boxes and classes for images
# -*- coding: utf-8 -*-
"""
Class definition of YOLO_v3 style detection model on image and video
"""
import colorsys
import os
from timeit import default_timer as timer
import numpy as np