Skip to content

Instantly share code, notes, and snippets.

Avatar

Nick Doiron mapmeld

View GitHub Profile
@mapmeld
mapmeld / chatgpt-on-gptnyc.md
Created March 12, 2023 02:21
ChatGPT on GPT-NYC type questions
View chatgpt-on-gptnyc.md

Date: February 25, 2023

Questions in quotes

My comments in bold italics

Hi, I'm going to ask some questions about New York City as a new visitor, and you should respond as an expert resident.

Sure, I'm happy to help! What would you like to know about New York City?

@mapmeld
mapmeld / example.py
Created December 16, 2021 17:04
How to write an ML example
View example.py
# All I'm looking for on an ML example:
# ! pip install name_of_library
from name_of_library import model, other_stuff
tdata = load_data_from_file() # not a built-in datasets source where I'd need to write python to add data
tdata.apply(changes) # whose dataset is so perfect we don't edit it
model.train(tdata, **explained_params)
@mapmeld
mapmeld / patching_models_bigsci_proposal.md
Last active December 14, 2021 03:11
Patching Models BigSci Proposal
View patching_models_bigsci_proposal.md

Patching Models with New Words, People, and Events

May 6 - June 15, 2021

Scope

Once a large pre-trained language model is published, it is a snapshot of language when its corpus was collected. What are ways to update models to support new or newly-frequent terms (BIPOC), phrasing (social distancing), or people and events (Fyre Festival)? What are reliable, low-cost ways to test and benchmark these methods of updating?

Current status

@mapmeld
mapmeld / Vanguard-Sortfix.js
Last active December 1, 2021 16:43
Sort stocks by percent change or my holdings change
View Vanguard-Sortfix.js
/*
Generally, don't run random JS in your browser console, especially on financial sites, but here we are
By default this sorts by Percent Change. If you uncomment the next line it sorts by myDelta (price x your shares)
Caveats:
- I'm not affiliated with Vanguard or any licensed financial advisor or tax preparer. I don't have a clue what's going on with your finances.
- The script assumes you did NOT trade today; it uses today's change and current shares
- Delta-sort does not handle penny stocks as well because the UI says 0.01 and we reverse-engineer from current balance
*/
let sortRule = 'pct';
@mapmeld
mapmeld / add_data_task.py
Created July 9, 2021 12:40
Add text file task to T5
View add_data_task.py
t5.data.TaskRegistry.add(
"byt5_ex",
t5.data.TextLineTask,
split_to_filepattern={
"train": "gs://BUCKET/train_lines.txt",
"validation": "gs://BUCKET/validation_lines.txt",
},
text_preprocessor=[
functools.partial(
t5.data.preprocessors.parse_tsv,
@mapmeld
mapmeld / bb.md
Last active January 4, 2021 16:01
Bangla Benchmark runs
View bb.md

Code: https://colab.research.google.com/drive/1vltPI81atzRvlALv4eCvEB0KdFoEaCOb?usp=sharing

Can these scores be improved? YES!

Rerunning with more training data, more epochs of training, or using other libraries to set a learning rate / other hyperparameters before training.

  • Experimenting with epochs - when I doubled the number of epochs, MuRIL improves only slightly (69.5->69.7 on one task)

The point of a benchmark is to run these models through a reasonable and identical process; you can tweak hyperparameters on any model to improve results.

@mapmeld
mapmeld / twiml-lightning-share.md
Last active October 22, 2020 15:38
twiml-lightning-share
View twiml-lightning-share.md
@mapmeld
mapmeld / dv-wave.py
Last active July 16, 2020 18:29
PythonCode
View dv-wave.py
from simpletransformers.classification import ClassificationModel
# set use_cuda=False on CPU-only platforms
model = ClassificationModel('bert', 'monsoon-nlp/dv-wave', num_labels=8, use_cuda=True, args={
'reprocess_input_data': True,
'use_cached_eval_features': False,
'overwrite_output_dir': True,
'num_train_epochs': 3,
'silent': True
})
@mapmeld
mapmeld / add_to_shapefile.py
Created July 5, 2020 23:10
Add JSON block data to a shapefile with GDAL
View add_to_shapefile.py
# pip install gdal
import json
from osgeo import ogr
# depends on your shapefile
target_shapefile = 'tl_2010_sample_shapefile.shp'
fips_id = 'GEOID10'
saveblocks = json.loads(open('savefile.json', 'r').read())
@mapmeld
mapmeld / load_acs.py
Last active July 8, 2020 16:15
Load 5-year ACS race + ethnicity data, ending in 2017
View load_acs.py
# pip install requests
import time, json
import requests
api_key = "API_KEY_STRING"
# look up FIPS for state and county:
# https://www.nrcs.usda.gov/wps/portal/nrcs/detail/national/home/?cid=nrcs143_013697
state = '12'
county_fips = ['086']