Nick Doiron mapmeld

## add_data_task.py
t5.data.TaskRegistry.add(
      "byt5_ex",
      t5.data.TextLineTask,
      split_to_filepattern={
            "train": "gs://BUCKET/train_lines.txt",
            "validation": "gs://BUCKET/validation_lines.txt",
        },
      text_preprocessor=[
        functools.partial(
          t5.data.preprocessors.parse_tsv,

## CensusAPI.txt
NOTE: This how-to was written for the Census API at http://thedataweb.rm.census.gov/ -- it has since been moved to http://api.census.gov/

Mike Stucka, our contact at the Macon Telegraph, sent us a link to the Census's official API which is launching next month. You can skip ahead to the site - http://www.census.gov/developers/ - and get an API key, but also read my notes after using this yesterday:

1) The datasets

--- The 2010 Census Summary comes from everyone filling out census forms, and you can get stats at state level down to a super-detailed block level. Info from this includes population, age, gender, race, home ownership, members of a household, and various combinations of that. Full list: http://www.census.gov/developers/data/sf1.xml

--- The 2006-2010 American Community Survey is a longer form given to fewer households over 5 years (so its numbers are incompatible with the 2010 Census). You can get stats down only to the block group level. In addition to the standard census stats, you get: educa

## bb.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              1 star
            
          
                mapmeld
                / bb.md
            
            
              Last active
              January 4, 2021 16:01
            
              
                Bangla Benchmark runs
              
          
    Code: https://colab.research.google.com/drive/1vltPI81atzRvlALv4eCvEB0KdFoEaCOb?usp=sharing
Can these scores be improved? YES!
Rerunning with more training data, more epochs of training, or using other libraries to set a learning rate / other hyperparameters before training.

Experimenting with epochs - when I doubled the number of epochs, MuRIL improves only slightly (69.5->69.7 on one task)

The point of a benchmark is to run these models through a reasonable and identical process;
you can tweak hyperparameters on any model to improve results.

  
## twiml-lightning-share.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              1 star
            
          
                mapmeld
                / twiml-lightning-share.md
            
            
              Last active
              October 22, 2020 15:38
            
              
                twiml-lightning-share
              
          
    Measuring Gender Bias in Spanish Language Models

Presenter

Nick Doiron, Tufts University / Independent Research
GitHub: https://github.com/mapmeld ; LinkedIn: https://www.linkedin.com/in/nickdoiron/
Context


## dv-wave.py
from simpletransformers.classification import ClassificationModel

# set use_cuda=False on CPU-only platforms
model = ClassificationModel('bert', 'monsoon-nlp/dv-wave', num_labels=8, use_cuda=True, args={
    'reprocess_input_data': True,
    'use_cached_eval_features': False,
    'overwrite_output_dir': True,
    'num_train_epochs': 3,
    'silent': True
})

## load_acs.py
# pip install requests
import time, json
import requests

api_key = "API_KEY_STRING"

# look up FIPS for state and county:
# https://www.nrcs.usda.gov/wps/portal/nrcs/detail/national/home/?cid=nrcs143_013697
state = '12'
county_fips = ['086']

## add_to_shapefile.py
# pip install gdal
import json
from osgeo import ogr

# depends on your shapefile
target_shapefile = 'tl_2010_sample_shapefile.shp'
fips_id = 'GEOID10'

saveblocks = json.loads(open('savefile.json', 'r').read())

## links.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                mapmeld
                / links.md
            
            
              Last active
              May 13, 2020 04:19
            
              
                References and links for Spanish counterfactuals
              
          
    Related Research

Towards Debiasing Sentence Representations (Liang et al., 2020)
English BERT and ELMO

https://cs.cmu.edu/~pliang/papers/acl2020_debiasing.pdf
https://github.com/pliang279/sent_debias

Measuring Bias in Contextualized Word Representations
(Kurita et al., 2019) analyzes English BERT with word-association
https://aclweb.org/anthology/W19-3823/

  
## AutoKeras_image_regression.ipynb

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                mapmeld
                / AutoKeras_image_regression.ipynb
            
            
              Created
              April 28, 2020 21:22
            
              
                AutoKeras Image Regression
              
          
      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## yolo.py
# -*- coding: utf-8 -*-
"""
Class definition of YOLO_v3 style detection model on image and video
"""

import colorsys
import os
from timeit import default_timer as timer

import numpy as np
	t5.data.TaskRegistry.add(
	"byt5_ex",
	t5.data.TextLineTask,
	split_to_filepattern={
	"train": "gs://BUCKET/train_lines.txt",
	"validation": "gs://BUCKET/validation_lines.txt",
	},
	text_preprocessor=[
	functools.partial(
	t5.data.preprocessors.parse_tsv,
	NOTE: This how-to was written for the Census API at http://thedataweb.rm.census.gov/ -- it has since been moved to http://api.census.gov/

	Mike Stucka, our contact at the Macon Telegraph, sent us a link to the Census's official API which is launching next month. You can skip ahead to the site - http://www.census.gov/developers/ - and get an API key, but also read my notes after using this yesterday:

	1) The datasets

	--- The 2010 Census Summary comes from everyone filling out census forms, and you can get stats at state level down to a super-detailed block level. Info from this includes population, age, gender, race, home ownership, members of a household, and various combinations of that. Full list: http://www.census.gov/developers/data/sf1.xml

	--- The 2006-2010 American Community Survey is a longer form given to fewer households over 5 years (so its numbers are incompatible with the 2010 Census). You can get stats down only to the block group level. In addition to the standard census stats, you get: educa
	from simpletransformers.classification import ClassificationModel

	# set use_cuda=False on CPU-only platforms
	model = ClassificationModel('bert', 'monsoon-nlp/dv-wave', num_labels=8, use_cuda=True, args={
	'reprocess_input_data': True,
	'use_cached_eval_features': False,
	'overwrite_output_dir': True,
	'num_train_epochs': 3,
	'silent': True
	})
	# pip install requests
	import time, json
	import requests

	api_key = "API_KEY_STRING"

	# look up FIPS for state and county:
	# https://www.nrcs.usda.gov/wps/portal/nrcs/detail/national/home/?cid=nrcs143_013697
	state = '12'
	county_fips = ['086']
	# pip install gdal
	import json
	from osgeo import ogr

	# depends on your shapefile
	target_shapefile = 'tl_2010_sample_shapefile.shp'
	fips_id = 'GEOID10'

	saveblocks = json.loads(open('savefile.json', 'r').read())
	# -- coding: utf-8 --
	"""
	Class definition of YOLO_v3 style detection model on image and video
	"""

	import colorsys
	import os
	from timeit import default_timer as timer

	import numpy as np