Towards Debiasing Sentence Representations (Liang et al., 2020) English BERT and ELMO
Measuring Bias in Contextualized Word Representations (Kurita et al., 2019) analyzes English BERT with word-association https://aclweb.org/anthology/W19-3823/
from simpletransformers.classification import ClassificationModel | |
# set use_cuda=False on CPU-only platforms | |
model = ClassificationModel('bert', 'monsoon-nlp/dv-wave', num_labels=8, use_cuda=True, args={ | |
'reprocess_input_data': True, | |
'use_cached_eval_features': False, | |
'overwrite_output_dir': True, | |
'num_train_epochs': 3, | |
'silent': True | |
}) |
# pip install gdal | |
import json | |
from osgeo import ogr | |
# depends on your shapefile | |
target_shapefile = 'tl_2010_sample_shapefile.shp' | |
fips_id = 'GEOID10' | |
saveblocks = json.loads(open('savefile.json', 'r').read()) |
# pip install requests | |
import time, json | |
import requests | |
api_key = "API_KEY_STRING" | |
# look up FIPS for state and county: | |
# https://www.nrcs.usda.gov/wps/portal/nrcs/detail/national/home/?cid=nrcs143_013697 | |
state = '12' | |
county_fips = ['086'] |
Towards Debiasing Sentence Representations (Liang et al., 2020) English BERT and ELMO
Measuring Bias in Contextualized Word Representations (Kurita et al., 2019) analyzes English BERT with word-association https://aclweb.org/anthology/W19-3823/
# -*- coding: utf-8 -*- | |
""" | |
Class definition of YOLO_v3 style detection model on image and video | |
""" | |
import colorsys | |
import os | |
from timeit import default_timer as timer | |
import numpy as np |
This is a first attempt at a Hindi language model trained with Google Research's ELECTRA. I don't modify ELECTRA until we get into finetuning, and only then because there's hardcoded train and test files
CoLab: https://colab.research.google.com/drive/1R8TciRSM7BONJRBc9CBZbzOmz39FTLl_
Additional background: https://medium.com/@mapmeld/teaching-hindi-to-electra-b11084baab81
It's available on HuggingFace: https://huggingface.co/monsoon-nlp/hindi-bert - sample usage: https://colab.research.google.com/drive/1mSeeSfVSOT7e-dVhPlmSsQRvpn6xC05w
''' Script for downloading all GLUE data. | |
Note: for legal reasons, we are unable to host MRPC. | |
You can either use the version hosted by the SentEval team, which is already tokenized, | |
or you can download the original data from (https://download.microsoft.com/download/D/4/6/D46FF87A-F6B9-4252-AA8B-3604ED519838/MSRParaphraseCorpus.msi) and extract the data from it manually. | |
For Windows users, you can run the .msi file. For Mac and Linux users, consider an external library such as 'cabextract' (see below for an example). | |
You should then rename and place specific files in a folder (see below for an example). | |
mkdir MRPC | |
cabextract MSRParaphraseCorpus.msi -d MRPC |
Assuming the final delegate counts and viability number are correct
# calculate number of plans, by state | |
import json | |
plans = open('districtr_full_export.json', 'r').read().strip().split("\n") | |
places = {} | |
for raw in plans: | |
plan = json.loads(raw) | |
if ("plan" in plan) and ("placeId" in plan["plan"]): | |
place = plan["plan"]["placeId"] | |
if place in places: |