Skip to content

Instantly share code, notes, and snippets.

@rogerallen
Last active December 19, 2021 20:48
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save rogerallen/d75440e8e5ea4762374dfd5c1ddf84e0 to your computer and use it in GitHub Desktop.
Save rogerallen/d75440e8e5ea4762374dfd5c1ddf84e0 to your computer and use it in GitHub Desktop.
Code to scrape Wikipedia for iso3166 data
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Mapping Countries & States to 2-letter dictionaries\n",
"\n",
"September, 2021\n",
"\n",
"I've been surprised by how many comments I get on my gists of Python dictionaries for mapping to 2-letter acronyms. Both [states](https://gist.github.com/rogerallen/1583593) and [countries](https://gist.github.com/rogerallen/1583606). Who knows, this code might be the most popular code I've ever written? :-)\n",
"\n",
"So, I decided to automate the process of keeping them up-to-date. Countries are created periodically, states/provinces can change, and people always want slight tweaks to fit their own use cases. In addition, Wikipedia will also change, requiring updates.\n",
"\n",
"Looking around, I found that Wikipedia has not only a list of Country mappings, it also links to every countries subdivisions. Using `requests`, `BeautifulSoup` and `Pandas`, I was able to automate the process of scraping the site to keep these gists up-to-date. \n",
"\n",
"Feel free to use this code for your own purposes. It is dedicated to the public domain. To the extent possible under law, Roger Allen has waived all copyright and related or neighboring rights to this code.\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import requests\n",
"from bs4 import BeautifulSoup\n",
"import pandas as pd\n",
"import datetime"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"this_code_gist_url = 'https://gist.github.com/rogerallen/d75440e8e5ea4762374dfd5c1ddf84e0'\n",
"\n",
"# We can't get this from ISO, so, let's use Wikipedia\n",
"site = 'https://en.wikipedia.org'\n",
"iso3166_url = site+'/wiki/ISO_3166-1_alpha-2'\n",
"\n",
"def get_iso3166_countries():\n",
" \"\"\"Get the Country table from Wikipedia from the URL https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2\n",
" inside the section 'Officially assigned code elements' which contains the 3rd table with columns labelled\n",
" Code, Country name, Year, ccTLD, ISO 3166-2 and Notes.\n",
" \n",
" Reads the table and returns it as a Pandas DataFrame.\n",
" \"\"\"\n",
" THE_TABLE = 2 # !!! Assumes required table is 3rd one !!!\n",
" try:\n",
" # grab web page\n",
" url = iso3166_url\n",
" r = requests.get(url)\n",
" r.raise_for_status()\n",
" # convert to soup\n",
" soup = BeautifulSoup(r.text, 'html.parser')\n",
" # find THE_TABLE\n",
" for i,table in enumerate(soup.find_all('table')):\n",
" if i == THE_TABLE:\n",
" break\n",
" # iterate the table gathering data into data dictionary\n",
" have_header = False\n",
" columns = {} # column number -> column label\n",
" data = {} # table's data, dict of arrays\n",
" for i,tr in enumerate(table.find_all('tr')):\n",
" if not have_header:\n",
" # read the headers into columns dict & setup data dict\n",
" have_header = True\n",
" for j,th in enumerate(tr.find_all('th')):\n",
" columns[j] = th.text.replace(' (using title case)','').strip()\n",
" for col in columns.values():\n",
" data[col] = []\n",
" else:\n",
" # grab text for each td in tr, except ISO 3166-2, grab link\n",
" for j,td in enumerate(tr.find_all('td')):\n",
" if columns[j] != 'ISO 3166-2':\n",
" data[columns[j]].append(td.text)\n",
" else:\n",
" a = td.find('a')\n",
" data[columns[j]].append(a.get('href'))\n",
" # convert dictionary to dataframe\n",
" return pd.DataFrame(data)\n",
" except HTTPError:\n",
" print(f\"ERROR: {r.status_code} accessing url: {url}\")\n",
" \n",
"def get_iso3166_subdivisions(country_df,country_code):\n",
" \"\"\"The country dataframe has links to Wikipedia subdivision tables for each country. \n",
" E.g. for the U.S. it has States & Territories. Return a Pandas DataFrame containing\n",
" this data.\"\"\"\n",
" try:\n",
" url = site + country_df[country_df['Code'] == country_code]['ISO 3166-2'].values[0]\n",
" r = requests.get(url)\n",
" r.raise_for_status()\n",
" # convert to soup\n",
" soup = BeautifulSoup(r.text, 'html.parser')\n",
" # assume there is only one table containing the data.\n",
" # iterate the table gathering data into data dictionary\n",
" have_header = False\n",
" columns = {} # column number -> column label\n",
" data = {} # table's data, dict of arrays\n",
" table = soup.find('table')\n",
" for i,tr in enumerate(table.find_all('tr')):\n",
" if not have_header:\n",
" # read the headers into columns dict & setup data dict\n",
" have_header = True\n",
" for j,th in enumerate(tr.find_all('th')):\n",
" columns[j] = th.text.strip()\n",
" for col in columns.values():\n",
" data[col] = []\n",
" else:\n",
" # grab text for each td in tr\n",
" for j,td in enumerate(tr.find_all('td')):\n",
" data[columns[j]].append(td.text.strip())\n",
" # convert dictionary to dataframe\n",
" return pd.DataFrame(data)\n",
" except HTTPError:\n",
" print(f\"ERROR: {r.status_code} accessing url: {url}\")\n",
" \n",
"def print_country_gist():\n",
" now_str = datetime.datetime.now().strftime(\"%Y-%m-%d %H:%M:%S\")\n",
" country_df = get_iso3166_countries()\n",
" # print header\n",
" print(f\"\"\"# Python Dictionary to translate Countries to Two-Letter codes and vice versa.\n",
"#\n",
"# https://gist.github.com/rogerallen/1583606\n",
"#\n",
"# Dedicated to the public domain. To the extent possible under law,\n",
"# Roger Allen has waived all copyright and related or neighboring\n",
"# rights to this code. Data originally from Wikipedia at the url:\n",
"# {iso3166_url}\n",
"#\n",
"# Automatically Generated {now_str} via Jupyter Notebook from\n",
"# {this_code_gist_url} \n",
"\n",
"country_to_abbrev = {{\"\"\")\n",
" # print countries\n",
" for i in range(country_df.shape[0]):\n",
" print(f' \"{country_df.iloc[i][\"Country name\"]}\": \"{country_df.iloc[i][\"Code\"]}\",') \n",
" # print footer\n",
" print(\"\"\"}\n",
" \n",
"# invert the dictionary\n",
"abbrev_to_country = dict(map(reversed, country_to_abbrev.items()))\n",
"\n",
"\"\"\")\n",
" \n",
"def print_us_gist():\n",
" now_str = datetime.datetime.now().strftime(\"%Y-%m-%d %H:%M:%S\")\n",
" country_df = get_iso3166_countries()\n",
" country_code = 'US'\n",
" us_df = get_iso3166_subdivisions(country_df,country_code)\n",
" url = site + country_df[country_df['Code'] == country_code]['ISO 3166-2'].values[0]\n",
" # print header\n",
" print(f\"\"\"# United States of America Python Dictionary to translate States,\n",
"# Districts & Territories to Two-Letter codes and vice versa.\n",
"#\n",
"# Canonical URL: https://gist.github.com/rogerallen/1583593\n",
"#\n",
"# Dedicated to the public domain. To the extent possible under law,\n",
"# Roger Allen has waived all copyright and related or neighboring\n",
"# rights to this code. Data originally from Wikipedia at the url:\n",
"# {url}\n",
"#\n",
"# Automatically Generated {now_str} via Jupyter Notebook from\n",
"# {this_code_gist_url} \n",
"\n",
"us_state_to_abbrev = {{\"\"\")\n",
" # print countries\n",
" for i in range(us_df.shape[0]):\n",
" state = us_df.iloc[i][\"Subdivision name (en)\"]\n",
" abbrev = us_df.iloc[i][\"Code\"].split('-')[1]\n",
" print(f' \"{state}\": \"{abbrev}\",') \n",
" # print footer\n",
" print(\"\"\"}\n",
" \n",
"# invert the dictionary\n",
"abbrev_to_us_state = dict(map(reversed, us_state_to_abbrev.items()))\n",
"\n",
"\"\"\")"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"# Python Dictionary to translate Countries to Two-Letter codes and vice versa.\n",
"#\n",
"# https://gist.github.com/rogerallen/1583606\n",
"#\n",
"# Dedicated to the public domain. To the extent possible under law,\n",
"# Roger Allen has waived all copyright and related or neighboring\n",
"# rights to this code. Data originally from Wikipedia at the url:\n",
"# https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2\n",
"#\n",
"# Automatically Generated 2021-09-11 18:04:35 via Jupyter Notebook from\n",
"# https://gist.github.com/rogerallen/d75440e8e5ea4762374dfd5c1ddf84e0 \n",
"\n",
"country_to_abbrev = {\n",
" \"Andorra\": \"AD\",\n",
" \"United Arab Emirates\": \"AE\",\n",
" \"Afghanistan\": \"AF\",\n",
" \"Antigua and Barbuda\": \"AG\",\n",
" \"Anguilla\": \"AI\",\n",
" \"Albania\": \"AL\",\n",
" \"Armenia\": \"AM\",\n",
" \"Angola\": \"AO\",\n",
" \"Antarctica\": \"AQ\",\n",
" \"Argentina\": \"AR\",\n",
" \"American Samoa\": \"AS\",\n",
" \"Austria\": \"AT\",\n",
" \"Australia\": \"AU\",\n",
" \"Aruba\": \"AW\",\n",
" \"Åland Islands\": \"AX\",\n",
" \"Azerbaijan\": \"AZ\",\n",
" \"Bosnia and Herzegovina\": \"BA\",\n",
" \"Barbados\": \"BB\",\n",
" \"Bangladesh\": \"BD\",\n",
" \"Belgium\": \"BE\",\n",
" \"Burkina Faso\": \"BF\",\n",
" \"Bulgaria\": \"BG\",\n",
" \"Bahrain\": \"BH\",\n",
" \"Burundi\": \"BI\",\n",
" \"Benin\": \"BJ\",\n",
" \"Saint Barthélemy\": \"BL\",\n",
" \"Bermuda\": \"BM\",\n",
" \"Brunei Darussalam\": \"BN\",\n",
" \"Bolivia (Plurinational State of)\": \"BO\",\n",
" \"Bonaire, Sint Eustatius and Saba\": \"BQ\",\n",
" \"Brazil\": \"BR\",\n",
" \"Bahamas\": \"BS\",\n",
" \"Bhutan\": \"BT\",\n",
" \"Bouvet Island\": \"BV\",\n",
" \"Botswana\": \"BW\",\n",
" \"Belarus\": \"BY\",\n",
" \"Belize\": \"BZ\",\n",
" \"Canada\": \"CA\",\n",
" \"Cocos (Keeling) Islands\": \"CC\",\n",
" \"Congo, Democratic Republic of the\": \"CD\",\n",
" \"Central African Republic\": \"CF\",\n",
" \"Congo\": \"CG\",\n",
" \"Switzerland\": \"CH\",\n",
" \"Côte d'Ivoire\": \"CI\",\n",
" \"Cook Islands\": \"CK\",\n",
" \"Chile\": \"CL\",\n",
" \"Cameroon\": \"CM\",\n",
" \"China\": \"CN\",\n",
" \"Colombia\": \"CO\",\n",
" \"Costa Rica\": \"CR\",\n",
" \"Cuba\": \"CU\",\n",
" \"Cabo Verde\": \"CV\",\n",
" \"Curaçao\": \"CW\",\n",
" \"Christmas Island\": \"CX\",\n",
" \"Cyprus\": \"CY\",\n",
" \"Czechia\": \"CZ\",\n",
" \"Germany\": \"DE\",\n",
" \"Djibouti\": \"DJ\",\n",
" \"Denmark\": \"DK\",\n",
" \"Dominica\": \"DM\",\n",
" \"Dominican Republic\": \"DO\",\n",
" \"Algeria\": \"DZ\",\n",
" \"Ecuador\": \"EC\",\n",
" \"Estonia\": \"EE\",\n",
" \"Egypt\": \"EG\",\n",
" \"Western Sahara\": \"EH\",\n",
" \"Eritrea\": \"ER\",\n",
" \"Spain\": \"ES\",\n",
" \"Ethiopia\": \"ET\",\n",
" \"Finland\": \"FI\",\n",
" \"Fiji\": \"FJ\",\n",
" \"Falkland Islands (Malvinas)\": \"FK\",\n",
" \"Micronesia (Federated States of)\": \"FM\",\n",
" \"Faroe Islands\": \"FO\",\n",
" \"France\": \"FR\",\n",
" \"Gabon\": \"GA\",\n",
" \"United Kingdom of Great Britain and Northern Ireland\": \"GB\",\n",
" \"Grenada\": \"GD\",\n",
" \"Georgia\": \"GE\",\n",
" \"French Guiana\": \"GF\",\n",
" \"Guernsey\": \"GG\",\n",
" \"Ghana\": \"GH\",\n",
" \"Gibraltar\": \"GI\",\n",
" \"Greenland\": \"GL\",\n",
" \"Gambia\": \"GM\",\n",
" \"Guinea\": \"GN\",\n",
" \"Guadeloupe\": \"GP\",\n",
" \"Equatorial Guinea\": \"GQ\",\n",
" \"Greece\": \"GR\",\n",
" \"South Georgia and the South Sandwich Islands\": \"GS\",\n",
" \"Guatemala\": \"GT\",\n",
" \"Guam\": \"GU\",\n",
" \"Guinea-Bissau\": \"GW\",\n",
" \"Guyana\": \"GY\",\n",
" \"Hong Kong\": \"HK\",\n",
" \"Heard Island and McDonald Islands\": \"HM\",\n",
" \"Honduras\": \"HN\",\n",
" \"Croatia\": \"HR\",\n",
" \"Haiti\": \"HT\",\n",
" \"Hungary\": \"HU\",\n",
" \"Indonesia\": \"ID\",\n",
" \"Ireland\": \"IE\",\n",
" \"Israel\": \"IL\",\n",
" \"Isle of Man\": \"IM\",\n",
" \"India\": \"IN\",\n",
" \"British Indian Ocean Territory\": \"IO\",\n",
" \"Iraq\": \"IQ\",\n",
" \"Iran (Islamic Republic of)\": \"IR\",\n",
" \"Iceland\": \"IS\",\n",
" \"Italy\": \"IT\",\n",
" \"Jersey\": \"JE\",\n",
" \"Jamaica\": \"JM\",\n",
" \"Jordan\": \"JO\",\n",
" \"Japan\": \"JP\",\n",
" \"Kenya\": \"KE\",\n",
" \"Kyrgyzstan\": \"KG\",\n",
" \"Cambodia\": \"KH\",\n",
" \"Kiribati\": \"KI\",\n",
" \"Comoros\": \"KM\",\n",
" \"Saint Kitts and Nevis\": \"KN\",\n",
" \"Korea (Democratic People's Republic of)\": \"KP\",\n",
" \"Korea, Republic of\": \"KR\",\n",
" \"Kuwait\": \"KW\",\n",
" \"Cayman Islands\": \"KY\",\n",
" \"Kazakhstan\": \"KZ\",\n",
" \"Lao People's Democratic Republic\": \"LA\",\n",
" \"Lebanon\": \"LB\",\n",
" \"Saint Lucia\": \"LC\",\n",
" \"Liechtenstein\": \"LI\",\n",
" \"Sri Lanka\": \"LK\",\n",
" \"Liberia\": \"LR\",\n",
" \"Lesotho\": \"LS\",\n",
" \"Lithuania\": \"LT\",\n",
" \"Luxembourg\": \"LU\",\n",
" \"Latvia\": \"LV\",\n",
" \"Libya\": \"LY\",\n",
" \"Morocco\": \"MA\",\n",
" \"Monaco\": \"MC\",\n",
" \"Moldova, Republic of\": \"MD\",\n",
" \"Montenegro\": \"ME\",\n",
" \"Saint Martin (French part)\": \"MF\",\n",
" \"Madagascar\": \"MG\",\n",
" \"Marshall Islands\": \"MH\",\n",
" \"North Macedonia\": \"MK\",\n",
" \"Mali\": \"ML\",\n",
" \"Myanmar\": \"MM\",\n",
" \"Mongolia\": \"MN\",\n",
" \"Macao\": \"MO\",\n",
" \"Northern Mariana Islands\": \"MP\",\n",
" \"Martinique\": \"MQ\",\n",
" \"Mauritania\": \"MR\",\n",
" \"Montserrat\": \"MS\",\n",
" \"Malta\": \"MT\",\n",
" \"Mauritius\": \"MU\",\n",
" \"Maldives\": \"MV\",\n",
" \"Malawi\": \"MW\",\n",
" \"Mexico\": \"MX\",\n",
" \"Malaysia\": \"MY\",\n",
" \"Mozambique\": \"MZ\",\n",
" \"Namibia\": \"NA\",\n",
" \"New Caledonia\": \"NC\",\n",
" \"Niger\": \"NE\",\n",
" \"Norfolk Island\": \"NF\",\n",
" \"Nigeria\": \"NG\",\n",
" \"Nicaragua\": \"NI\",\n",
" \"Netherlands\": \"NL\",\n",
" \"Norway\": \"NO\",\n",
" \"Nepal\": \"NP\",\n",
" \"Nauru\": \"NR\",\n",
" \"Niue\": \"NU\",\n",
" \"New Zealand\": \"NZ\",\n",
" \"Oman\": \"OM\",\n",
" \"Panama\": \"PA\",\n",
" \"Peru\": \"PE\",\n",
" \"French Polynesia\": \"PF\",\n",
" \"Papua New Guinea\": \"PG\",\n",
" \"Philippines\": \"PH\",\n",
" \"Pakistan\": \"PK\",\n",
" \"Poland\": \"PL\",\n",
" \"Saint Pierre and Miquelon\": \"PM\",\n",
" \"Pitcairn\": \"PN\",\n",
" \"Puerto Rico\": \"PR\",\n",
" \"Palestine, State of\": \"PS\",\n",
" \"Portugal\": \"PT\",\n",
" \"Palau\": \"PW\",\n",
" \"Paraguay\": \"PY\",\n",
" \"Qatar\": \"QA\",\n",
" \"Réunion\": \"RE\",\n",
" \"Romania\": \"RO\",\n",
" \"Serbia\": \"RS\",\n",
" \"Russian Federation\": \"RU\",\n",
" \"Rwanda\": \"RW\",\n",
" \"Saudi Arabia\": \"SA\",\n",
" \"Solomon Islands\": \"SB\",\n",
" \"Seychelles\": \"SC\",\n",
" \"Sudan\": \"SD\",\n",
" \"Sweden\": \"SE\",\n",
" \"Singapore\": \"SG\",\n",
" \"Saint Helena, Ascension and Tristan da Cunha\": \"SH\",\n",
" \"Slovenia\": \"SI\",\n",
" \"Svalbard and Jan Mayen\": \"SJ\",\n",
" \"Slovakia\": \"SK\",\n",
" \"Sierra Leone\": \"SL\",\n",
" \"San Marino\": \"SM\",\n",
" \"Senegal\": \"SN\",\n",
" \"Somalia\": \"SO\",\n",
" \"Suriname\": \"SR\",\n",
" \"South Sudan\": \"SS\",\n",
" \"Sao Tome and Principe\": \"ST\",\n",
" \"El Salvador\": \"SV\",\n",
" \"Sint Maarten (Dutch part)\": \"SX\",\n",
" \"Syrian Arab Republic\": \"SY\",\n",
" \"Eswatini\": \"SZ\",\n",
" \"Turks and Caicos Islands\": \"TC\",\n",
" \"Chad\": \"TD\",\n",
" \"French Southern Territories\": \"TF\",\n",
" \"Togo\": \"TG\",\n",
" \"Thailand\": \"TH\",\n",
" \"Tajikistan\": \"TJ\",\n",
" \"Tokelau\": \"TK\",\n",
" \"Timor-Leste\": \"TL\",\n",
" \"Turkmenistan\": \"TM\",\n",
" \"Tunisia\": \"TN\",\n",
" \"Tonga\": \"TO\",\n",
" \"Turkey\": \"TR\",\n",
" \"Trinidad and Tobago\": \"TT\",\n",
" \"Tuvalu\": \"TV\",\n",
" \"Taiwan, Province of China\": \"TW\",\n",
" \"Tanzania, United Republic of\": \"TZ\",\n",
" \"Ukraine\": \"UA\",\n",
" \"Uganda\": \"UG\",\n",
" \"United States Minor Outlying Islands\": \"UM\",\n",
" \"United States of America\": \"US\",\n",
" \"Uruguay\": \"UY\",\n",
" \"Uzbekistan\": \"UZ\",\n",
" \"Holy See\": \"VA\",\n",
" \"Saint Vincent and the Grenadines\": \"VC\",\n",
" \"Venezuela (Bolivarian Republic of)\": \"VE\",\n",
" \"Virgin Islands (British)\": \"VG\",\n",
" \"Virgin Islands (U.S.)\": \"VI\",\n",
" \"Viet Nam\": \"VN\",\n",
" \"Vanuatu\": \"VU\",\n",
" \"Wallis and Futuna\": \"WF\",\n",
" \"Samoa\": \"WS\",\n",
" \"Yemen\": \"YE\",\n",
" \"Mayotte\": \"YT\",\n",
" \"South Africa\": \"ZA\",\n",
" \"Zambia\": \"ZM\",\n",
" \"Zimbabwe\": \"ZW\",\n",
"}\n",
" \n",
"# invert the dictionary\n",
"abbrev_to_country = dict(map(reversed, country_to_abbrev.items()))\n",
"\n",
"\n"
]
}
],
"source": [
"print_country_gist()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# paste the code generated above to check"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"# Python Dictionary to translate Countries to Two-Letter codes and vice versa.\n",
"#\n",
"# https://gist.github.com/rogerallen/1583606\n",
"#\n",
"# Dedicated to the public domain. To the extent possible under law,\n",
"# Roger Allen has waived all copyright and related or neighboring\n",
"# rights to this code. Data originally from Wikipedia at the url:\n",
"# https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2\n",
"#\n",
"# Automatically Generated 2021-09-11 18:04:35 via Jupyter Notebook from\n",
"# https://gist.github.com/rogerallen/d75440e8e5ea4762374dfd5c1ddf84e0 \n",
"\n",
"country_to_abbrev = {\n",
" \"Andorra\": \"AD\",\n",
" \"United Arab Emirates\": \"AE\",\n",
" \"Afghanistan\": \"AF\",\n",
" \"Antigua and Barbuda\": \"AG\",\n",
" \"Anguilla\": \"AI\",\n",
" \"Albania\": \"AL\",\n",
" \"Armenia\": \"AM\",\n",
" \"Angola\": \"AO\",\n",
" \"Antarctica\": \"AQ\",\n",
" \"Argentina\": \"AR\",\n",
" \"American Samoa\": \"AS\",\n",
" \"Austria\": \"AT\",\n",
" \"Australia\": \"AU\",\n",
" \"Aruba\": \"AW\",\n",
" \"Åland Islands\": \"AX\",\n",
" \"Azerbaijan\": \"AZ\",\n",
" \"Bosnia and Herzegovina\": \"BA\",\n",
" \"Barbados\": \"BB\",\n",
" \"Bangladesh\": \"BD\",\n",
" \"Belgium\": \"BE\",\n",
" \"Burkina Faso\": \"BF\",\n",
" \"Bulgaria\": \"BG\",\n",
" \"Bahrain\": \"BH\",\n",
" \"Burundi\": \"BI\",\n",
" \"Benin\": \"BJ\",\n",
" \"Saint Barthélemy\": \"BL\",\n",
" \"Bermuda\": \"BM\",\n",
" \"Brunei Darussalam\": \"BN\",\n",
" \"Bolivia (Plurinational State of)\": \"BO\",\n",
" \"Bonaire, Sint Eustatius and Saba\": \"BQ\",\n",
" \"Brazil\": \"BR\",\n",
" \"Bahamas\": \"BS\",\n",
" \"Bhutan\": \"BT\",\n",
" \"Bouvet Island\": \"BV\",\n",
" \"Botswana\": \"BW\",\n",
" \"Belarus\": \"BY\",\n",
" \"Belize\": \"BZ\",\n",
" \"Canada\": \"CA\",\n",
" \"Cocos (Keeling) Islands\": \"CC\",\n",
" \"Congo, Democratic Republic of the\": \"CD\",\n",
" \"Central African Republic\": \"CF\",\n",
" \"Congo\": \"CG\",\n",
" \"Switzerland\": \"CH\",\n",
" \"Côte d'Ivoire\": \"CI\",\n",
" \"Cook Islands\": \"CK\",\n",
" \"Chile\": \"CL\",\n",
" \"Cameroon\": \"CM\",\n",
" \"China\": \"CN\",\n",
" \"Colombia\": \"CO\",\n",
" \"Costa Rica\": \"CR\",\n",
" \"Cuba\": \"CU\",\n",
" \"Cabo Verde\": \"CV\",\n",
" \"Curaçao\": \"CW\",\n",
" \"Christmas Island\": \"CX\",\n",
" \"Cyprus\": \"CY\",\n",
" \"Czechia\": \"CZ\",\n",
" \"Germany\": \"DE\",\n",
" \"Djibouti\": \"DJ\",\n",
" \"Denmark\": \"DK\",\n",
" \"Dominica\": \"DM\",\n",
" \"Dominican Republic\": \"DO\",\n",
" \"Algeria\": \"DZ\",\n",
" \"Ecuador\": \"EC\",\n",
" \"Estonia\": \"EE\",\n",
" \"Egypt\": \"EG\",\n",
" \"Western Sahara\": \"EH\",\n",
" \"Eritrea\": \"ER\",\n",
" \"Spain\": \"ES\",\n",
" \"Ethiopia\": \"ET\",\n",
" \"Finland\": \"FI\",\n",
" \"Fiji\": \"FJ\",\n",
" \"Falkland Islands (Malvinas)\": \"FK\",\n",
" \"Micronesia (Federated States of)\": \"FM\",\n",
" \"Faroe Islands\": \"FO\",\n",
" \"France\": \"FR\",\n",
" \"Gabon\": \"GA\",\n",
" \"United Kingdom of Great Britain and Northern Ireland\": \"GB\",\n",
" \"Grenada\": \"GD\",\n",
" \"Georgia\": \"GE\",\n",
" \"French Guiana\": \"GF\",\n",
" \"Guernsey\": \"GG\",\n",
" \"Ghana\": \"GH\",\n",
" \"Gibraltar\": \"GI\",\n",
" \"Greenland\": \"GL\",\n",
" \"Gambia\": \"GM\",\n",
" \"Guinea\": \"GN\",\n",
" \"Guadeloupe\": \"GP\",\n",
" \"Equatorial Guinea\": \"GQ\",\n",
" \"Greece\": \"GR\",\n",
" \"South Georgia and the South Sandwich Islands\": \"GS\",\n",
" \"Guatemala\": \"GT\",\n",
" \"Guam\": \"GU\",\n",
" \"Guinea-Bissau\": \"GW\",\n",
" \"Guyana\": \"GY\",\n",
" \"Hong Kong\": \"HK\",\n",
" \"Heard Island and McDonald Islands\": \"HM\",\n",
" \"Honduras\": \"HN\",\n",
" \"Croatia\": \"HR\",\n",
" \"Haiti\": \"HT\",\n",
" \"Hungary\": \"HU\",\n",
" \"Indonesia\": \"ID\",\n",
" \"Ireland\": \"IE\",\n",
" \"Israel\": \"IL\",\n",
" \"Isle of Man\": \"IM\",\n",
" \"India\": \"IN\",\n",
" \"British Indian Ocean Territory\": \"IO\",\n",
" \"Iraq\": \"IQ\",\n",
" \"Iran (Islamic Republic of)\": \"IR\",\n",
" \"Iceland\": \"IS\",\n",
" \"Italy\": \"IT\",\n",
" \"Jersey\": \"JE\",\n",
" \"Jamaica\": \"JM\",\n",
" \"Jordan\": \"JO\",\n",
" \"Japan\": \"JP\",\n",
" \"Kenya\": \"KE\",\n",
" \"Kyrgyzstan\": \"KG\",\n",
" \"Cambodia\": \"KH\",\n",
" \"Kiribati\": \"KI\",\n",
" \"Comoros\": \"KM\",\n",
" \"Saint Kitts and Nevis\": \"KN\",\n",
" \"Korea (Democratic People's Republic of)\": \"KP\",\n",
" \"Korea, Republic of\": \"KR\",\n",
" \"Kuwait\": \"KW\",\n",
" \"Cayman Islands\": \"KY\",\n",
" \"Kazakhstan\": \"KZ\",\n",
" \"Lao People's Democratic Republic\": \"LA\",\n",
" \"Lebanon\": \"LB\",\n",
" \"Saint Lucia\": \"LC\",\n",
" \"Liechtenstein\": \"LI\",\n",
" \"Sri Lanka\": \"LK\",\n",
" \"Liberia\": \"LR\",\n",
" \"Lesotho\": \"LS\",\n",
" \"Lithuania\": \"LT\",\n",
" \"Luxembourg\": \"LU\",\n",
" \"Latvia\": \"LV\",\n",
" \"Libya\": \"LY\",\n",
" \"Morocco\": \"MA\",\n",
" \"Monaco\": \"MC\",\n",
" \"Moldova, Republic of\": \"MD\",\n",
" \"Montenegro\": \"ME\",\n",
" \"Saint Martin (French part)\": \"MF\",\n",
" \"Madagascar\": \"MG\",\n",
" \"Marshall Islands\": \"MH\",\n",
" \"North Macedonia\": \"MK\",\n",
" \"Mali\": \"ML\",\n",
" \"Myanmar\": \"MM\",\n",
" \"Mongolia\": \"MN\",\n",
" \"Macao\": \"MO\",\n",
" \"Northern Mariana Islands\": \"MP\",\n",
" \"Martinique\": \"MQ\",\n",
" \"Mauritania\": \"MR\",\n",
" \"Montserrat\": \"MS\",\n",
" \"Malta\": \"MT\",\n",
" \"Mauritius\": \"MU\",\n",
" \"Maldives\": \"MV\",\n",
" \"Malawi\": \"MW\",\n",
" \"Mexico\": \"MX\",\n",
" \"Malaysia\": \"MY\",\n",
" \"Mozambique\": \"MZ\",\n",
" \"Namibia\": \"NA\",\n",
" \"New Caledonia\": \"NC\",\n",
" \"Niger\": \"NE\",\n",
" \"Norfolk Island\": \"NF\",\n",
" \"Nigeria\": \"NG\",\n",
" \"Nicaragua\": \"NI\",\n",
" \"Netherlands\": \"NL\",\n",
" \"Norway\": \"NO\",\n",
" \"Nepal\": \"NP\",\n",
" \"Nauru\": \"NR\",\n",
" \"Niue\": \"NU\",\n",
" \"New Zealand\": \"NZ\",\n",
" \"Oman\": \"OM\",\n",
" \"Panama\": \"PA\",\n",
" \"Peru\": \"PE\",\n",
" \"French Polynesia\": \"PF\",\n",
" \"Papua New Guinea\": \"PG\",\n",
" \"Philippines\": \"PH\",\n",
" \"Pakistan\": \"PK\",\n",
" \"Poland\": \"PL\",\n",
" \"Saint Pierre and Miquelon\": \"PM\",\n",
" \"Pitcairn\": \"PN\",\n",
" \"Puerto Rico\": \"PR\",\n",
" \"Palestine, State of\": \"PS\",\n",
" \"Portugal\": \"PT\",\n",
" \"Palau\": \"PW\",\n",
" \"Paraguay\": \"PY\",\n",
" \"Qatar\": \"QA\",\n",
" \"Réunion\": \"RE\",\n",
" \"Romania\": \"RO\",\n",
" \"Serbia\": \"RS\",\n",
" \"Russian Federation\": \"RU\",\n",
" \"Rwanda\": \"RW\",\n",
" \"Saudi Arabia\": \"SA\",\n",
" \"Solomon Islands\": \"SB\",\n",
" \"Seychelles\": \"SC\",\n",
" \"Sudan\": \"SD\",\n",
" \"Sweden\": \"SE\",\n",
" \"Singapore\": \"SG\",\n",
" \"Saint Helena, Ascension and Tristan da Cunha\": \"SH\",\n",
" \"Slovenia\": \"SI\",\n",
" \"Svalbard and Jan Mayen\": \"SJ\",\n",
" \"Slovakia\": \"SK\",\n",
" \"Sierra Leone\": \"SL\",\n",
" \"San Marino\": \"SM\",\n",
" \"Senegal\": \"SN\",\n",
" \"Somalia\": \"SO\",\n",
" \"Suriname\": \"SR\",\n",
" \"South Sudan\": \"SS\",\n",
" \"Sao Tome and Principe\": \"ST\",\n",
" \"El Salvador\": \"SV\",\n",
" \"Sint Maarten (Dutch part)\": \"SX\",\n",
" \"Syrian Arab Republic\": \"SY\",\n",
" \"Eswatini\": \"SZ\",\n",
" \"Turks and Caicos Islands\": \"TC\",\n",
" \"Chad\": \"TD\",\n",
" \"French Southern Territories\": \"TF\",\n",
" \"Togo\": \"TG\",\n",
" \"Thailand\": \"TH\",\n",
" \"Tajikistan\": \"TJ\",\n",
" \"Tokelau\": \"TK\",\n",
" \"Timor-Leste\": \"TL\",\n",
" \"Turkmenistan\": \"TM\",\n",
" \"Tunisia\": \"TN\",\n",
" \"Tonga\": \"TO\",\n",
" \"Turkey\": \"TR\",\n",
" \"Trinidad and Tobago\": \"TT\",\n",
" \"Tuvalu\": \"TV\",\n",
" \"Taiwan, Province of China\": \"TW\",\n",
" \"Tanzania, United Republic of\": \"TZ\",\n",
" \"Ukraine\": \"UA\",\n",
" \"Uganda\": \"UG\",\n",
" \"United States Minor Outlying Islands\": \"UM\",\n",
" \"United States of America\": \"US\",\n",
" \"Uruguay\": \"UY\",\n",
" \"Uzbekistan\": \"UZ\",\n",
" \"Holy See\": \"VA\",\n",
" \"Saint Vincent and the Grenadines\": \"VC\",\n",
" \"Venezuela (Bolivarian Republic of)\": \"VE\",\n",
" \"Virgin Islands (British)\": \"VG\",\n",
" \"Virgin Islands (U.S.)\": \"VI\",\n",
" \"Viet Nam\": \"VN\",\n",
" \"Vanuatu\": \"VU\",\n",
" \"Wallis and Futuna\": \"WF\",\n",
" \"Samoa\": \"WS\",\n",
" \"Yemen\": \"YE\",\n",
" \"Mayotte\": \"YT\",\n",
" \"South Africa\": \"ZA\",\n",
" \"Zambia\": \"ZM\",\n",
" \"Zimbabwe\": \"ZW\",\n",
"}\n",
" \n",
"# invert the dictionary\n",
"abbrev_to_country = dict(map(reversed, country_to_abbrev.items()))"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"('Denmark', 'DK')"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# quick check\n",
"abbrev_to_country[\"DK\"], country_to_abbrev[\"Denmark\"]"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"# United States of America Python Dictionary to translate States,\n",
"# Districts & Territories to Two-Letter codes and vice versa.\n",
"#\n",
"# Canonical URL: https://gist.github.com/rogerallen/1583593\n",
"#\n",
"# Dedicated to the public domain. To the extent possible under law,\n",
"# Roger Allen has waived all copyright and related or neighboring\n",
"# rights to this code. Data originally from Wikipedia at the url:\n",
"# https://en.wikipedia.org/wiki/ISO_3166-2:US\n",
"#\n",
"# Automatically Generated 2021-09-11 18:04:36 via Jupyter Notebook from\n",
"# https://gist.github.com/rogerallen/d75440e8e5ea4762374dfd5c1ddf84e0 \n",
"\n",
"us_state_to_abbrev = {\n",
" \"Alabama\": \"AL\",\n",
" \"Alaska\": \"AK\",\n",
" \"Arizona\": \"AZ\",\n",
" \"Arkansas\": \"AR\",\n",
" \"California\": \"CA\",\n",
" \"Colorado\": \"CO\",\n",
" \"Connecticut\": \"CT\",\n",
" \"Delaware\": \"DE\",\n",
" \"Florida\": \"FL\",\n",
" \"Georgia\": \"GA\",\n",
" \"Hawaii\": \"HI\",\n",
" \"Idaho\": \"ID\",\n",
" \"Illinois\": \"IL\",\n",
" \"Indiana\": \"IN\",\n",
" \"Iowa\": \"IA\",\n",
" \"Kansas\": \"KS\",\n",
" \"Kentucky\": \"KY\",\n",
" \"Louisiana\": \"LA\",\n",
" \"Maine\": \"ME\",\n",
" \"Maryland\": \"MD\",\n",
" \"Massachusetts\": \"MA\",\n",
" \"Michigan\": \"MI\",\n",
" \"Minnesota\": \"MN\",\n",
" \"Mississippi\": \"MS\",\n",
" \"Missouri\": \"MO\",\n",
" \"Montana\": \"MT\",\n",
" \"Nebraska\": \"NE\",\n",
" \"Nevada\": \"NV\",\n",
" \"New Hampshire\": \"NH\",\n",
" \"New Jersey\": \"NJ\",\n",
" \"New Mexico\": \"NM\",\n",
" \"New York\": \"NY\",\n",
" \"North Carolina\": \"NC\",\n",
" \"North Dakota\": \"ND\",\n",
" \"Ohio\": \"OH\",\n",
" \"Oklahoma\": \"OK\",\n",
" \"Oregon\": \"OR\",\n",
" \"Pennsylvania\": \"PA\",\n",
" \"Rhode Island\": \"RI\",\n",
" \"South Carolina\": \"SC\",\n",
" \"South Dakota\": \"SD\",\n",
" \"Tennessee\": \"TN\",\n",
" \"Texas\": \"TX\",\n",
" \"Utah\": \"UT\",\n",
" \"Vermont\": \"VT\",\n",
" \"Virginia\": \"VA\",\n",
" \"Washington\": \"WA\",\n",
" \"West Virginia\": \"WV\",\n",
" \"Wisconsin\": \"WI\",\n",
" \"Wyoming\": \"WY\",\n",
" \"District of Columbia\": \"DC\",\n",
" \"American Samoa\": \"AS\",\n",
" \"Guam\": \"GU\",\n",
" \"Northern Mariana Islands\": \"MP\",\n",
" \"Puerto Rico\": \"PR\",\n",
" \"United States Minor Outlying Islands\": \"UM\",\n",
" \"U.S. Virgin Islands\": \"VI\",\n",
"}\n",
" \n",
"# invert the dictionary\n",
"abbrev_to_us_state = dict(map(reversed, us_state_to_abbrev.items()))\n",
"\n",
"\n"
]
}
],
"source": [
"print_us_gist()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# paste the code generated above to check"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"# United States of America Python Dictionary to translate States,\n",
"# Districts & Territories to Two-Letter codes and vice versa.\n",
"#\n",
"# Canonical URL: https://gist.github.com/rogerallen/1583593\n",
"#\n",
"# Dedicated to the public domain. To the extent possible under law,\n",
"# Roger Allen has waived all copyright and related or neighboring\n",
"# rights to this code. Data originally from Wikipedia at the url:\n",
"# https://en.wikipedia.org/wiki/ISO_3166-2:US\n",
"#\n",
"# Automatically Generated 2021-09-11 18:04:36 via Jupyter Notebook from\n",
"# https://gist.github.com/rogerallen/d75440e8e5ea4762374dfd5c1ddf84e0 \n",
"\n",
"us_state_to_abbrev = {\n",
" \"Alabama\": \"AL\",\n",
" \"Alaska\": \"AK\",\n",
" \"Arizona\": \"AZ\",\n",
" \"Arkansas\": \"AR\",\n",
" \"California\": \"CA\",\n",
" \"Colorado\": \"CO\",\n",
" \"Connecticut\": \"CT\",\n",
" \"Delaware\": \"DE\",\n",
" \"Florida\": \"FL\",\n",
" \"Georgia\": \"GA\",\n",
" \"Hawaii\": \"HI\",\n",
" \"Idaho\": \"ID\",\n",
" \"Illinois\": \"IL\",\n",
" \"Indiana\": \"IN\",\n",
" \"Iowa\": \"IA\",\n",
" \"Kansas\": \"KS\",\n",
" \"Kentucky\": \"KY\",\n",
" \"Louisiana\": \"LA\",\n",
" \"Maine\": \"ME\",\n",
" \"Maryland\": \"MD\",\n",
" \"Massachusetts\": \"MA\",\n",
" \"Michigan\": \"MI\",\n",
" \"Minnesota\": \"MN\",\n",
" \"Mississippi\": \"MS\",\n",
" \"Missouri\": \"MO\",\n",
" \"Montana\": \"MT\",\n",
" \"Nebraska\": \"NE\",\n",
" \"Nevada\": \"NV\",\n",
" \"New Hampshire\": \"NH\",\n",
" \"New Jersey\": \"NJ\",\n",
" \"New Mexico\": \"NM\",\n",
" \"New York\": \"NY\",\n",
" \"North Carolina\": \"NC\",\n",
" \"North Dakota\": \"ND\",\n",
" \"Ohio\": \"OH\",\n",
" \"Oklahoma\": \"OK\",\n",
" \"Oregon\": \"OR\",\n",
" \"Pennsylvania\": \"PA\",\n",
" \"Rhode Island\": \"RI\",\n",
" \"South Carolina\": \"SC\",\n",
" \"South Dakota\": \"SD\",\n",
" \"Tennessee\": \"TN\",\n",
" \"Texas\": \"TX\",\n",
" \"Utah\": \"UT\",\n",
" \"Vermont\": \"VT\",\n",
" \"Virginia\": \"VA\",\n",
" \"Washington\": \"WA\",\n",
" \"West Virginia\": \"WV\",\n",
" \"Wisconsin\": \"WI\",\n",
" \"Wyoming\": \"WY\",\n",
" \"District of Columbia\": \"DC\",\n",
" \"American Samoa\": \"AS\",\n",
" \"Guam\": \"GU\",\n",
" \"Northern Mariana Islands\": \"MP\",\n",
" \"Puerto Rico\": \"PR\",\n",
" \"United States Minor Outlying Islands\": \"UM\",\n",
" \"U.S. Virgin Islands\": \"VI\",\n",
"}\n",
" \n",
"# invert the dictionary\n",
"abbrev_to_us_state = dict(map(reversed, us_state_to_abbrev.items()))"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"('Wisconsin', 'WI')"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# quick check\n",
"abbrev_to_us_state['WI'], us_state_to_abbrev[\"Wisconsin\"]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.8.3 64-bit ('base': conda)",
"language": "python",
"name": "python38364bitbaseconda928955c9788047afaf3fe4597c084b52"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.3"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
@ahmedshahriar
Copy link

ahmedshahriar commented Dec 19, 2021

Thank you @rogerallen for sharing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment