Skip to content

Instantly share code, notes, and snippets.

@laurenmarietta
Created August 17, 2023 18:50
Show Gist options
  • Save laurenmarietta/e62071ed7153e428577abfcd0a9ca157 to your computer and use it in GitHub Desktop.
Save laurenmarietta/e62071ed7153e428577abfcd0a9ca157 to your computer and use it in GitHub Desktop.
IETF Inclusive Language Study
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"id": "33b79bfb",
"metadata": {},
"source": [
"# Exploring Inclusive Language on IETF Mailing Lists with BigBang\n",
"*August 2023*<br>\n",
"*Center for Democracy and Technology*<br>\n",
"*Lauren Chambers (lauren@ischool.berkeley.edu)*\n",
"\n",
"This notebook is modified from the BigBang package Tutorial: https://github.com/datactive/bigbang/blob/main/examples/Tutorial.ipynb"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "bd72afcf",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Intel MKL WARNING: Support of Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled only processors has been deprecated. Intel oneAPI Math Kernel Library 2025.0 will require Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions.\n",
"Intel MKL WARNING: Support of Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled only processors has been deprecated. Intel oneAPI Math Kernel Library 2025.0 will require Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions.\n"
]
}
],
"source": [
"# Import necessary packages\n",
"import datetime\n",
"import glob\n",
"import os\n",
"import re\n",
"\n",
"import matplotlib.dates as mdates\n",
"import matplotlib.pyplot as plt\n",
"import nltk\n",
"import numpy as np\n",
"import pandas as pd\n",
"\n",
"# NOTE: Must have manually installed BigBang from source first, see:\n",
"# https://github.com/datactive/bigbang#installation\n",
"import bigbang.archive as archive"
]
},
{
"cell_type": "markdown",
"id": "6eb733ad",
"metadata": {},
"source": [
"# Data ingress\n",
"## Download IETF archives"
]
},
{
"cell_type": "markdown",
"id": "1925d89d",
"metadata": {},
"source": [
"Before you can analyze the IETF data, you must download it. Likely, this will look like specifying a list of IETF mailing lists (like [this one](https://github.com/datactive/bigbang/blob/main/examples/url_collections/mm.ietf.org.txt)) and then running this command in the Terminal:\n",
"\n",
"`bigbang collect-mail --file path/to/ietf_urls.txt`"
]
},
{
"cell_type": "markdown",
"id": "4f0c8118",
"metadata": {},
"source": [
"## Load IETF archives"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "b97c7e09",
"metadata": {},
"outputs": [],
"source": [
"# Get list of filepaths to each mailing list archive\n",
"archive_dir = \"/path/to/ietf/archives\"\n",
"all_ietf_lists = [os.path.basename(d) for d in glob.glob(archive_dir + \"/*\") if \".csv\" not in d]"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "d1001ee0",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[<bigbang.archive.Archive at 0x146e77350>,\n",
" <bigbang.archive.Archive at 0x1466ece10>,\n",
" <bigbang.archive.Archive at 0x146e53410>,\n",
" <bigbang.archive.Archive at 0x146da9810>,\n",
" <bigbang.archive.Archive at 0x1466eac90>,\n",
" <bigbang.archive.Archive at 0x146eb3e50>,\n",
" <bigbang.archive.Archive at 0x146ee0fd0>,\n",
" <bigbang.archive.Archive at 0x13e1e87d0>,\n",
" <bigbang.archive.Archive at 0x146eb1610>,\n",
" <bigbang.archive.Archive at 0x146eb0d90>,\n",
" <bigbang.archive.Archive at 0x1439841d0>,\n",
" <bigbang.archive.Archive at 0x146de1b90>,\n",
" <bigbang.archive.Archive at 0x143971790>,\n",
" <bigbang.archive.Archive at 0x146ee1910>,\n",
" <bigbang.archive.Archive at 0x146e72bd0>,\n",
" <bigbang.archive.Archive at 0x13a6dced0>,\n",
" <bigbang.archive.Archive at 0x1439726d0>,\n",
" <bigbang.archive.Archive at 0x146edf6d0>,\n",
" <bigbang.archive.Archive at 0x13e1b1690>]"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Create a bigbang.archive.Archive() object for each mailing list\n",
"archives = []\n",
"archive_names = []\n",
"empty_csv = []\n",
"no_mail_files = []\n",
"\n",
"for mailing_list in all_ietf_lists:\n",
" try:\n",
" a = archive.Archive(mailing_list, archive_dir=archive_dir) \n",
" archives.append(a)\n",
" archive_names.append(mailing_list)\n",
" except archive.MissingDataException as e:\n",
" if \"Archive after initial processing is empty\" in str(e):\n",
" empty_csv.append(mailing_list)\n",
" else:\n",
" print(mailing_list)\n",
" raise\n",
" except archive.ArchiveWarning as e:\n",
" if \"The error message is: 'NoneType' object is not subscriptable\" in str(e):\n",
" no_mail_files.append(mailing_list)\n",
" else:\n",
" print(mailing_list)\n",
" raise\n",
" except:\n",
" print(mailing_list)\n",
" raise\n",
" \n",
"archives"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "ddddbf03",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"All archives downloaded: 1179\n",
"Successfully loaded: 19\n",
"Failed to load; emails downloaded incorrectly: 0\n",
"Failed to load; no emails in archive: 1\n"
]
}
],
"source": [
"# See how many worked\n",
"print(\"All archives downloaded:\", len(all_ietf_lists))\n",
"print(\"Successfully loaded:\", len(archives))\n",
"print(\"Failed to load; emails downloaded incorrectly:\", len(empty_csv))\n",
"print(\"Failed to load; no emails in archive:\", len(no_mail_files))"
]
},
{
"cell_type": "markdown",
"id": "8ad3ccd6",
"metadata": {},
"source": [
"# Conduct Word Analysis \n",
"## Count words"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "3b53c6f5",
"metadata": {},
"outputs": [],
"source": [
"def count_word(text, word, exact=False, ignore_reply_quotes=True):\n",
" '''\n",
" exact - `word` must be bordered by spaces or punctuation.\n",
" ignore_reply_quotes - don't process lines that start with '>' (copied text in email replies)\n",
" '''\n",
" if not text:\n",
" return 0\n",
" \n",
" if isinstance(text, float) and np.isnan(text):\n",
" return 0\n",
" \n",
" if ignore_reply_quotes:\n",
" text = re.sub(\"\\\\n>.*?(?=\\\\n|$)\", '', \n",
" text)\n",
" \n",
" # For a single word\n",
" if exact and len(word.split(\" \")) <= 1:\n",
" ## normalize the text - remove apostrophe and punctuation, lower case\n",
" normalized_text = re.sub(r'[^\\w]', ' ',text.replace(\"'\",\"\")).lower()\n",
" \n",
" # tokenize the text - split it into a list along word/phrase boundaries\n",
" tokenized_text = nltk.tokenize.word_tokenize(normalized_text)\n",
" \n",
" if exact:\n",
" return tokenized_text.count(word)\n",
" \n",
" # For a phrase\n",
" else:\n",
" return text.lower().count(word)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "f4b1b4e4",
"metadata": {},
"outputs": [],
"source": [
"# Define words to count\n",
"checkwords = [\"whitelist\", \"blacklist\", \"slave\", \"master\"]"
]
},
{
"cell_type": "code",
"execution_count": 27,
"id": "e7666171",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0 91all\n",
"1 sframe\n",
"2 babel\n",
"3 lsd\n",
"4 6band\n",
"5 yot\n",
"6 oam\n",
"7 82attendees\n",
"8 iola-wgcharter-tool\n",
"9 103all\n",
"10 secdir\n",
"11 mipshop\n",
"12 80attendees\n",
"13 pidloc\n",
"14 earlywarning\n",
"15 cnrp\n",
"16 6lowapp\n",
"17 107attendees\n",
"18 78attendees\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>91all whitelist</th>\n",
" <th>91all blacklist</th>\n",
" <th>91all slave</th>\n",
" <th>91all master</th>\n",
" <th>sframe whitelist</th>\n",
" <th>sframe blacklist</th>\n",
" <th>sframe slave</th>\n",
" <th>sframe master</th>\n",
" <th>babel whitelist</th>\n",
" <th>babel blacklist</th>\n",
" <th>...</th>\n",
" <th>6lowapp slave</th>\n",
" <th>6lowapp master</th>\n",
" <th>107attendees whitelist</th>\n",
" <th>107attendees blacklist</th>\n",
" <th>107attendees slave</th>\n",
" <th>107attendees master</th>\n",
" <th>78attendees whitelist</th>\n",
" <th>78attendees blacklist</th>\n",
" <th>78attendees slave</th>\n",
" <th>78attendees master</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Date</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1999-10-12</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1999-10-14</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1999-10-18</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1999-10-20</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1999-11-01</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2022-04-08</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2022-04-10</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2022-04-11</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2022-04-14</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2022-04-15</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>4731 rows × 76 columns</p>\n",
"</div>"
],
"text/plain": [
" 91all whitelist 91all blacklist 91all slave 91all master \\\n",
"Date \n",
"1999-10-12 NaN NaN NaN NaN \n",
"1999-10-14 NaN NaN NaN NaN \n",
"1999-10-18 NaN NaN NaN NaN \n",
"1999-10-20 NaN NaN NaN NaN \n",
"1999-11-01 NaN NaN NaN NaN \n",
"... ... ... ... ... \n",
"2022-04-08 NaN NaN NaN NaN \n",
"2022-04-10 NaN NaN NaN NaN \n",
"2022-04-11 NaN NaN NaN NaN \n",
"2022-04-14 NaN NaN NaN NaN \n",
"2022-04-15 NaN NaN NaN NaN \n",
"\n",
" sframe whitelist sframe blacklist sframe slave sframe master \\\n",
"Date \n",
"1999-10-12 NaN NaN NaN NaN \n",
"1999-10-14 NaN NaN NaN NaN \n",
"1999-10-18 NaN NaN NaN NaN \n",
"1999-10-20 NaN NaN NaN NaN \n",
"1999-11-01 NaN NaN NaN NaN \n",
"... ... ... ... ... \n",
"2022-04-08 NaN NaN NaN NaN \n",
"2022-04-10 NaN NaN NaN NaN \n",
"2022-04-11 NaN NaN NaN NaN \n",
"2022-04-14 NaN NaN NaN NaN \n",
"2022-04-15 NaN NaN NaN NaN \n",
"\n",
" babel whitelist babel blacklist ... 6lowapp slave \\\n",
"Date ... \n",
"1999-10-12 NaN NaN ... NaN \n",
"1999-10-14 NaN NaN ... NaN \n",
"1999-10-18 NaN NaN ... NaN \n",
"1999-10-20 NaN NaN ... NaN \n",
"1999-11-01 NaN NaN ... NaN \n",
"... ... ... ... ... \n",
"2022-04-08 NaN NaN ... NaN \n",
"2022-04-10 NaN NaN ... NaN \n",
"2022-04-11 NaN NaN ... NaN \n",
"2022-04-14 NaN NaN ... NaN \n",
"2022-04-15 NaN NaN ... NaN \n",
"\n",
" 6lowapp master 107attendees whitelist 107attendees blacklist \\\n",
"Date \n",
"1999-10-12 NaN NaN NaN \n",
"1999-10-14 NaN NaN NaN \n",
"1999-10-18 NaN NaN NaN \n",
"1999-10-20 NaN NaN NaN \n",
"1999-11-01 NaN NaN NaN \n",
"... ... ... ... \n",
"2022-04-08 NaN NaN NaN \n",
"2022-04-10 NaN NaN NaN \n",
"2022-04-11 NaN NaN NaN \n",
"2022-04-14 NaN NaN NaN \n",
"2022-04-15 NaN NaN NaN \n",
"\n",
" 107attendees slave 107attendees master 78attendees whitelist \\\n",
"Date \n",
"1999-10-12 NaN NaN NaN \n",
"1999-10-14 NaN NaN NaN \n",
"1999-10-18 NaN NaN NaN \n",
"1999-10-20 NaN NaN NaN \n",
"1999-11-01 NaN NaN NaN \n",
"... ... ... ... \n",
"2022-04-08 NaN NaN NaN \n",
"2022-04-10 NaN NaN NaN \n",
"2022-04-11 NaN NaN NaN \n",
"2022-04-14 NaN NaN NaN \n",
"2022-04-15 NaN NaN NaN \n",
"\n",
" 78attendees blacklist 78attendees slave 78attendees master \n",
"Date \n",
"1999-10-12 NaN NaN NaN \n",
"1999-10-14 NaN NaN NaN \n",
"1999-10-18 NaN NaN NaN \n",
"1999-10-20 NaN NaN NaN \n",
"1999-11-01 NaN NaN NaN \n",
"... ... ... ... \n",
"2022-04-08 NaN NaN NaN \n",
"2022-04-10 NaN NaN NaN \n",
"2022-04-11 NaN NaN NaN \n",
"2022-04-14 NaN NaN NaN \n",
"2022-04-15 NaN NaN NaN \n",
"\n",
"[4731 rows x 76 columns]"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Count how many times each checkword appears in each mailing list archive\n",
"archive_word_data = {}\n",
"for i, a in enumerate(archives):\n",
" print (i, archive_names[i])\n",
" data = a.data.copy()\n",
" dates = data['Date'].map(lambda x: x.date())\n",
" data['Date'] = dates\n",
" \n",
" for word in checkwords:\n",
" data[word] = data['Body'].apply(lambda x: count_word(x,word, exact=True))\n",
" archive_word_data[f\"{archive_names[i]} {word}\"] = data.groupby('Date')[word].sum()\n",
"\n",
"word_data = pd.DataFrame(archive_word_data)\n",
"word_data.index = pd.to_datetime(word_data.index) # Make the index a datetime object\n",
"word_data"
]
},
{
"cell_type": "code",
"execution_count": 28,
"id": "42dfc1dc",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>91all whitelist</th>\n",
" <th>91all blacklist</th>\n",
" <th>91all slave</th>\n",
" <th>91all master</th>\n",
" <th>sframe whitelist</th>\n",
" <th>sframe blacklist</th>\n",
" <th>sframe slave</th>\n",
" <th>sframe master</th>\n",
" <th>babel whitelist</th>\n",
" <th>babel blacklist</th>\n",
" <th>...</th>\n",
" <th>6lowapp slave</th>\n",
" <th>6lowapp master</th>\n",
" <th>107attendees whitelist</th>\n",
" <th>107attendees blacklist</th>\n",
" <th>107attendees slave</th>\n",
" <th>107attendees master</th>\n",
" <th>78attendees whitelist</th>\n",
" <th>78attendees blacklist</th>\n",
" <th>78attendees slave</th>\n",
" <th>78attendees master</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Date</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1999-10-12</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1999-10-14</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1999-10-18</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1999-10-20</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1999-11-01</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2022-04-08</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2022-04-10</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2022-04-11</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2022-04-14</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2022-04-15</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>...</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>4731 rows × 76 columns</p>\n",
"</div>"
],
"text/plain": [
" 91all whitelist 91all blacklist 91all slave 91all master \\\n",
"Date \n",
"1999-10-12 NaN NaN NaN NaN \n",
"1999-10-14 NaN NaN NaN NaN \n",
"1999-10-18 NaN NaN NaN NaN \n",
"1999-10-20 NaN NaN NaN NaN \n",
"1999-11-01 NaN NaN NaN NaN \n",
"... ... ... ... ... \n",
"2022-04-08 NaN NaN NaN NaN \n",
"2022-04-10 NaN NaN NaN NaN \n",
"2022-04-11 NaN NaN NaN NaN \n",
"2022-04-14 NaN NaN NaN NaN \n",
"2022-04-15 NaN NaN NaN NaN \n",
"\n",
" sframe whitelist sframe blacklist sframe slave sframe master \\\n",
"Date \n",
"1999-10-12 NaN NaN NaN NaN \n",
"1999-10-14 NaN NaN NaN NaN \n",
"1999-10-18 NaN NaN NaN NaN \n",
"1999-10-20 NaN NaN NaN NaN \n",
"1999-11-01 NaN NaN NaN NaN \n",
"... ... ... ... ... \n",
"2022-04-08 NaN NaN NaN NaN \n",
"2022-04-10 NaN NaN NaN NaN \n",
"2022-04-11 NaN NaN NaN NaN \n",
"2022-04-14 NaN NaN NaN NaN \n",
"2022-04-15 NaN NaN NaN NaN \n",
"\n",
" babel whitelist babel blacklist ... 6lowapp slave \\\n",
"Date ... \n",
"1999-10-12 NaN NaN ... NaN \n",
"1999-10-14 NaN NaN ... NaN \n",
"1999-10-18 NaN NaN ... NaN \n",
"1999-10-20 NaN NaN ... NaN \n",
"1999-11-01 NaN NaN ... NaN \n",
"... ... ... ... ... \n",
"2022-04-08 NaN NaN ... NaN \n",
"2022-04-10 NaN NaN ... NaN \n",
"2022-04-11 NaN NaN ... NaN \n",
"2022-04-14 NaN NaN ... NaN \n",
"2022-04-15 NaN NaN ... NaN \n",
"\n",
" 6lowapp master 107attendees whitelist 107attendees blacklist \\\n",
"Date \n",
"1999-10-12 NaN NaN NaN \n",
"1999-10-14 NaN NaN NaN \n",
"1999-10-18 NaN NaN NaN \n",
"1999-10-20 NaN NaN NaN \n",
"1999-11-01 NaN NaN NaN \n",
"... ... ... ... \n",
"2022-04-08 NaN NaN NaN \n",
"2022-04-10 NaN NaN NaN \n",
"2022-04-11 NaN NaN NaN \n",
"2022-04-14 NaN NaN NaN \n",
"2022-04-15 NaN NaN NaN \n",
"\n",
" 107attendees slave 107attendees master 78attendees whitelist \\\n",
"Date \n",
"1999-10-12 NaN NaN NaN \n",
"1999-10-14 NaN NaN NaN \n",
"1999-10-18 NaN NaN NaN \n",
"1999-10-20 NaN NaN NaN \n",
"1999-11-01 NaN NaN NaN \n",
"... ... ... ... \n",
"2022-04-08 NaN NaN NaN \n",
"2022-04-10 NaN NaN NaN \n",
"2022-04-11 NaN NaN NaN \n",
"2022-04-14 NaN NaN NaN \n",
"2022-04-15 NaN NaN NaN \n",
"\n",
" 78attendees blacklist 78attendees slave 78attendees master \n",
"Date \n",
"1999-10-12 NaN NaN NaN \n",
"1999-10-14 NaN NaN NaN \n",
"1999-10-18 NaN NaN NaN \n",
"1999-10-20 NaN NaN NaN \n",
"1999-11-01 NaN NaN NaN \n",
"... ... ... ... \n",
"2022-04-08 NaN NaN NaN \n",
"2022-04-10 NaN NaN NaN \n",
"2022-04-11 NaN NaN NaN \n",
"2022-04-14 NaN NaN NaN \n",
"2022-04-15 NaN NaN NaN \n",
"\n",
"[4731 rows x 76 columns]"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Remove erroneous dates before 1970 or after 2023\n",
"all_dates = word_data.index\n",
"word_data = word_data[word_data.index.year > 1970]\n",
"word_data = word_data[word_data.index.year <= 2023]\n",
"word_data"
]
},
{
"cell_type": "code",
"execution_count": 29,
"id": "94c3b35d",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>91all whitelist</th>\n",
" <th>91all blacklist</th>\n",
" <th>91all slave</th>\n",
" <th>91all master</th>\n",
" <th>sframe whitelist</th>\n",
" <th>sframe blacklist</th>\n",
" <th>sframe slave</th>\n",
" <th>sframe master</th>\n",
" <th>babel whitelist</th>\n",
" <th>babel blacklist</th>\n",
" <th>...</th>\n",
" <th>6lowapp slave</th>\n",
" <th>6lowapp master</th>\n",
" <th>107attendees whitelist</th>\n",
" <th>107attendees blacklist</th>\n",
" <th>107attendees slave</th>\n",
" <th>107attendees master</th>\n",
" <th>78attendees whitelist</th>\n",
" <th>78attendees blacklist</th>\n",
" <th>78attendees slave</th>\n",
" <th>78attendees master</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1999-10-12</th>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1999-10-13</th>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1999-10-14</th>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1999-10-15</th>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1999-10-16</th>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2022-04-11</th>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2022-04-12</th>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2022-04-13</th>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2022-04-14</th>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2022-04-15</th>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>8222 rows × 76 columns</p>\n",
"</div>"
],
"text/plain": [
" 91all whitelist 91all blacklist 91all slave 91all master \\\n",
"1999-10-12 0.0 0.0 0.0 0.0 \n",
"1999-10-13 0.0 0.0 0.0 0.0 \n",
"1999-10-14 0.0 0.0 0.0 0.0 \n",
"1999-10-15 0.0 0.0 0.0 0.0 \n",
"1999-10-16 0.0 0.0 0.0 0.0 \n",
"... ... ... ... ... \n",
"2022-04-11 0.0 0.0 0.0 0.0 \n",
"2022-04-12 0.0 0.0 0.0 0.0 \n",
"2022-04-13 0.0 0.0 0.0 0.0 \n",
"2022-04-14 0.0 0.0 0.0 0.0 \n",
"2022-04-15 0.0 0.0 0.0 0.0 \n",
"\n",
" sframe whitelist sframe blacklist sframe slave sframe master \\\n",
"1999-10-12 0.0 0.0 0.0 0.0 \n",
"1999-10-13 0.0 0.0 0.0 0.0 \n",
"1999-10-14 0.0 0.0 0.0 0.0 \n",
"1999-10-15 0.0 0.0 0.0 0.0 \n",
"1999-10-16 0.0 0.0 0.0 0.0 \n",
"... ... ... ... ... \n",
"2022-04-11 0.0 0.0 0.0 0.0 \n",
"2022-04-12 0.0 0.0 0.0 0.0 \n",
"2022-04-13 0.0 0.0 0.0 0.0 \n",
"2022-04-14 0.0 0.0 0.0 0.0 \n",
"2022-04-15 0.0 0.0 0.0 0.0 \n",
"\n",
" babel whitelist babel blacklist ... 6lowapp slave \\\n",
"1999-10-12 0.0 0.0 ... 0.0 \n",
"1999-10-13 0.0 0.0 ... 0.0 \n",
"1999-10-14 0.0 0.0 ... 0.0 \n",
"1999-10-15 0.0 0.0 ... 0.0 \n",
"1999-10-16 0.0 0.0 ... 0.0 \n",
"... ... ... ... ... \n",
"2022-04-11 0.0 0.0 ... 0.0 \n",
"2022-04-12 0.0 0.0 ... 0.0 \n",
"2022-04-13 0.0 0.0 ... 0.0 \n",
"2022-04-14 0.0 0.0 ... 0.0 \n",
"2022-04-15 0.0 0.0 ... 0.0 \n",
"\n",
" 6lowapp master 107attendees whitelist 107attendees blacklist \\\n",
"1999-10-12 0.0 0.0 0.0 \n",
"1999-10-13 0.0 0.0 0.0 \n",
"1999-10-14 0.0 0.0 0.0 \n",
"1999-10-15 0.0 0.0 0.0 \n",
"1999-10-16 0.0 0.0 0.0 \n",
"... ... ... ... \n",
"2022-04-11 0.0 0.0 0.0 \n",
"2022-04-12 0.0 0.0 0.0 \n",
"2022-04-13 0.0 0.0 0.0 \n",
"2022-04-14 0.0 0.0 0.0 \n",
"2022-04-15 0.0 0.0 0.0 \n",
"\n",
" 107attendees slave 107attendees master 78attendees whitelist \\\n",
"1999-10-12 0.0 0.0 0.0 \n",
"1999-10-13 0.0 0.0 0.0 \n",
"1999-10-14 0.0 0.0 0.0 \n",
"1999-10-15 0.0 0.0 0.0 \n",
"1999-10-16 0.0 0.0 0.0 \n",
"... ... ... ... \n",
"2022-04-11 0.0 0.0 0.0 \n",
"2022-04-12 0.0 0.0 0.0 \n",
"2022-04-13 0.0 0.0 0.0 \n",
"2022-04-14 0.0 0.0 0.0 \n",
"2022-04-15 0.0 0.0 0.0 \n",
"\n",
" 78attendees blacklist 78attendees slave 78attendees master \n",
"1999-10-12 0.0 0.0 0.0 \n",
"1999-10-13 0.0 0.0 0.0 \n",
"1999-10-14 0.0 0.0 0.0 \n",
"1999-10-15 0.0 0.0 0.0 \n",
"1999-10-16 0.0 0.0 0.0 \n",
"... ... ... ... \n",
"2022-04-11 0.0 0.0 0.0 \n",
"2022-04-12 0.0 0.0 0.0 \n",
"2022-04-13 0.0 0.0 0.0 \n",
"2022-04-14 0.0 0.0 0.0 \n",
"2022-04-15 0.0 0.0 0.0 \n",
"\n",
"[8222 rows x 76 columns]"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Fill in dataframe for every date in the date range\n",
"new_date_range = pd.date_range(start=word_data.index.min(), end=word_data.index.max(), freq=\"D\")\n",
"word_data = word_data.reindex(new_date_range, fill_value=0) \n",
"word_data = word_data.fillna(0)\n",
"word_data"
]
},
{
"cell_type": "markdown",
"id": "fe3cc711",
"metadata": {},
"source": [
"## Correct for anomalous trends in the data\n",
"### Count automated GitHub emails containing \"refs/heads/master\" separately"
]
},
{
"cell_type": "code",
"execution_count": 30,
"id": "3f2d9a31",
"metadata": {
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"sframe master\n",
"babel master\n",
"lsd master\n",
"yot master\n",
"82attendees master\n",
"secdir master\n",
"mipshop master\n",
"80attendees master\n",
"cnrp master\n",
"6lowapp master\n",
"78attendees master\n"
]
}
],
"source": [
"for column in word_data.columns:\n",
" if re.match(\".*master$\", column) is not None:\n",
" c = word_data[column]\n",
" if c.sum() != 0:\n",
" print(column)\n",
" archive_name = column.split(\" m\")[0]\n",
" word_data[archive_name + \" master_github\"] = 0 # add new empty column\n",
" archive_data = archives[archive_names.index(archive_name)].data[[\"Date\", \"Body\"]]\n",
" for _, row in archive_data.iterrows():\n",
" if type(row.Body) == str and \"master\" in row.Body:\n",
" github_masters = row.Body.count(\"refs/heads/master\")\n",
" email_date = pd.Timestamp(row.Date.date())\n",
" if github_masters > 0:\n",
" print(archive_name, email_date, github_masters)\n",
" date_index = word_data.index == email_date\n",
" word_data.loc[date_index, archive_name + \" master_github\"] += github_masters\n",
" \n",
"# Subtract \"master_github\" column from \"master\" column\n",
"for c in word_data.columns:\n",
" if \"github\" in c:\n",
" og_archive_name = c.split(\"_github\")[0]\n",
" word_data[og_archive_name] = word_data[og_archive_name] - word_data[c]"
]
},
{
"cell_type": "markdown",
"id": "3be5180f",
"metadata": {},
"source": [
"### Count automated spam emails containing \"safetyslave.com\" separately"
]
},
{
"cell_type": "code",
"execution_count": 41,
"id": "ec2b0a8c",
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"babel slave\n",
"secdir slave\n",
"cnrp slave\n"
]
}
],
"source": [
"for column in word_data.columns:\n",
" if \"slave\" in column:\n",
" c = word_data[column]\n",
" if c.sum() != 0:\n",
" print(column)\n",
" archive_name = column.split(\" slave\")[0]\n",
" word_data[archive_name + \" safetyslave\"] = 0 # add new empty column\n",
" archive_data = archives[archive_names.index(archive_name)].data[[\"Date\", \"Body\"]]\n",
" for _, row in archive_data.iterrows():\n",
" if type(row.Body) == str and \"slave\" in row.Body:\n",
" safetyslave_url = row.Body.count(\"://safetyslave.com\")\n",
" email_date = pd.Timestamp(row.Date.date())\n",
" if safetyslave_url > 0:\n",
" print(archive_name, email_date, safetyslave_url)\n",
" date_index = word_data.index == email_date\n",
" word_data.loc[date_index, archive_name + \" safetyslave\"] += safetyslave_url\n",
" \n",
"# Subtract!\n",
"for c in word_data.columns:\n",
" if \"safetyslave\" in c:\n",
" og_archive_name = c.split(\"safety\")[0] + \"slave\"\n",
" word_data[og_archive_name] = word_data[og_archive_name] - word_data[c]"
]
},
{
"cell_type": "markdown",
"id": "4fa7f27e",
"metadata": {},
"source": [
"## Analyze word counts"
]
},
{
"cell_type": "code",
"execution_count": 43,
"id": "2c52ec6a",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>91all whitelist</th>\n",
" <th>91all blacklist</th>\n",
" <th>91all slave</th>\n",
" <th>91all master</th>\n",
" <th>sframe whitelist</th>\n",
" <th>sframe blacklist</th>\n",
" <th>sframe slave</th>\n",
" <th>sframe master</th>\n",
" <th>babel whitelist</th>\n",
" <th>babel blacklist</th>\n",
" <th>...</th>\n",
" <th>82attendees master_github</th>\n",
" <th>secdir master_github</th>\n",
" <th>mipshop master_github</th>\n",
" <th>80attendees master_github</th>\n",
" <th>cnrp master_github</th>\n",
" <th>6lowapp master_github</th>\n",
" <th>78attendees master_github</th>\n",
" <th>babel safetyslave</th>\n",
" <th>secdir safetyslave</th>\n",
" <th>cnrp safetyslave</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1999-11-10</th>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1999-11-11</th>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1999-11-12</th>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1999-11-13</th>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1999-11-14</th>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2022-04-11</th>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2022-04-12</th>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2022-04-13</th>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2022-04-14</th>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2022-04-15</th>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>8193 rows × 90 columns</p>\n",
"</div>"
],
"text/plain": [
" 91all whitelist 91all blacklist 91all slave 91all master \\\n",
"1999-11-10 0.0 0.0 0.0 0.0 \n",
"1999-11-11 0.0 0.0 0.0 0.0 \n",
"1999-11-12 0.0 0.0 0.0 0.0 \n",
"1999-11-13 0.0 0.0 0.0 0.0 \n",
"1999-11-14 0.0 0.0 0.0 0.0 \n",
"... ... ... ... ... \n",
"2022-04-11 0.0 0.0 0.0 0.0 \n",
"2022-04-12 0.0 0.0 0.0 0.0 \n",
"2022-04-13 0.0 0.0 0.0 0.0 \n",
"2022-04-14 0.0 0.0 0.0 0.0 \n",
"2022-04-15 0.0 0.0 0.0 0.0 \n",
"\n",
" sframe whitelist sframe blacklist sframe slave sframe master \\\n",
"1999-11-10 0.0 0.0 0.0 0.0 \n",
"1999-11-11 0.0 0.0 0.0 0.0 \n",
"1999-11-12 0.0 0.0 0.0 0.0 \n",
"1999-11-13 0.0 0.0 0.0 0.0 \n",
"1999-11-14 0.0 0.0 0.0 0.0 \n",
"... ... ... ... ... \n",
"2022-04-11 0.0 0.0 0.0 0.0 \n",
"2022-04-12 0.0 0.0 0.0 0.0 \n",
"2022-04-13 0.0 0.0 0.0 0.0 \n",
"2022-04-14 0.0 0.0 0.0 0.0 \n",
"2022-04-15 0.0 0.0 0.0 0.0 \n",
"\n",
" babel whitelist babel blacklist ... 82attendees master_github \\\n",
"1999-11-10 0.0 0.0 ... 0.0 \n",
"1999-11-11 0.0 0.0 ... 0.0 \n",
"1999-11-12 0.0 0.0 ... 0.0 \n",
"1999-11-13 0.0 0.0 ... 0.0 \n",
"1999-11-14 0.0 0.0 ... 0.0 \n",
"... ... ... ... ... \n",
"2022-04-11 0.0 0.0 ... 0.0 \n",
"2022-04-12 0.0 0.0 ... 0.0 \n",
"2022-04-13 0.0 0.0 ... 0.0 \n",
"2022-04-14 0.0 0.0 ... 0.0 \n",
"2022-04-15 0.0 0.0 ... 0.0 \n",
"\n",
" secdir master_github mipshop master_github \\\n",
"1999-11-10 0.0 0.0 \n",
"1999-11-11 0.0 0.0 \n",
"1999-11-12 0.0 0.0 \n",
"1999-11-13 0.0 0.0 \n",
"1999-11-14 0.0 0.0 \n",
"... ... ... \n",
"2022-04-11 0.0 0.0 \n",
"2022-04-12 0.0 0.0 \n",
"2022-04-13 0.0 0.0 \n",
"2022-04-14 0.0 0.0 \n",
"2022-04-15 0.0 0.0 \n",
"\n",
" 80attendees master_github cnrp master_github \\\n",
"1999-11-10 0.0 0.0 \n",
"1999-11-11 0.0 0.0 \n",
"1999-11-12 0.0 0.0 \n",
"1999-11-13 0.0 0.0 \n",
"1999-11-14 0.0 0.0 \n",
"... ... ... \n",
"2022-04-11 0.0 0.0 \n",
"2022-04-12 0.0 0.0 \n",
"2022-04-13 0.0 0.0 \n",
"2022-04-14 0.0 0.0 \n",
"2022-04-15 0.0 0.0 \n",
"\n",
" 6lowapp master_github 78attendees master_github \\\n",
"1999-11-10 0.0 0.0 \n",
"1999-11-11 0.0 0.0 \n",
"1999-11-12 0.0 0.0 \n",
"1999-11-13 0.0 0.0 \n",
"1999-11-14 0.0 0.0 \n",
"... ... ... \n",
"2022-04-11 0.0 0.0 \n",
"2022-04-12 0.0 0.0 \n",
"2022-04-13 0.0 0.0 \n",
"2022-04-14 0.0 0.0 \n",
"2022-04-15 0.0 0.0 \n",
"\n",
" babel safetyslave secdir safetyslave cnrp safetyslave \n",
"1999-11-10 0.0 0.0 0.0 \n",
"1999-11-11 0.0 0.0 0.0 \n",
"1999-11-12 0.0 0.0 0.0 \n",
"1999-11-13 0.0 0.0 0.0 \n",
"1999-11-14 0.0 0.0 0.0 \n",
"... ... ... ... \n",
"2022-04-11 0.0 0.0 0.0 \n",
"2022-04-12 0.0 0.0 0.0 \n",
"2022-04-13 0.0 0.0 0.0 \n",
"2022-04-14 0.0 0.0 0.0 \n",
"2022-04-15 0.0 0.0 0.0 \n",
"\n",
"[8193 rows x 90 columns]"
]
},
"execution_count": 43,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Calculate a rolling monthly average\n",
"window = 30 \n",
"plot_data = word_data.rolling(window).mean().dropna(how='all')\n",
"plot_data"
]
},
{
"cell_type": "code",
"execution_count": 44,
"id": "4ebe9484",
"metadata": {},
"outputs": [],
"source": [
"blacklist_names = [c for c in plot_data.columns if \"blacklist\" in c]\n",
"whitelist_names = [c for c in plot_data.columns if \"whitelist\" in c]\n",
"slave_names = [c for c in plot_data.columns if \" slave\" in c]\n",
"master_names = [c for c in plot_data.columns if re.match(\".*master$\", c) is not None]\n",
"\n",
"blacklist_data = plot_data[blacklist_names]\n",
"whitelist_data = plot_data[whitelist_names]\n",
"slave_data = plot_data[slave_names]\n",
"master_data = plot_data[master_names]"
]
},
{
"cell_type": "code",
"execution_count": 50,
"id": "6ec33637",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<Figure size 640x480 with 0 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"image/png": "",
"text/plain": [
"<Figure size 800x600 with 4 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"fig = plt.figure()\n",
"fig, [ax1, ax2, ax3, ax4] = plt.subplots(nrows=4, ncols=1, figsize=(8, 6))\n",
"\n",
"ax1.plot(blacklist_data.index, blacklist_data.sum(axis=1), label=\"'blacklist'\")\n",
"ax1.text(0.01, 1, \"'blacklist'\", horizontalalignment='left',\n",
" verticalalignment='top', transform=ax1.transAxes, fontsize=14)\n",
"ax1.text(datetime.datetime.strptime(\"12-22-2018\", \"%m-%d-%Y\"), blacklist_data.max().max(), \n",
" \"Inclusive\\nlanguage\\ndraft\", horizontalalignment='left',\n",
" verticalalignment='top', fontsize=10.5)\n",
"ax2.plot(whitelist_data.index, whitelist_data.sum(axis=1), label=\"'whitelist'\")\n",
"ax2.text(0.01, 1, \"'whitelist'\", horizontalalignment='left',\n",
" verticalalignment='top', transform=ax2.transAxes, fontsize=14)\n",
"ax3.plot(master_data.index, master_data.sum(axis=1), label=\"'master'\")\n",
"ax3.text(0.01, 1, \"'master'\", horizontalalignment='left',\n",
" verticalalignment='top', transform=ax3.transAxes, fontsize=14)\n",
"ax4.plot(slave_data.index, slave_data.sum(axis=1), label=\"'slave'\")\n",
"ax4.text(0.01, 1, \"'slave'\", horizontalalignment='left',\n",
" verticalalignment='top', transform=ax4.transAxes, fontsize=14)\n",
"\n",
"for ax in [ax1, ax2, ax3, ax4]:\n",
" ax.axvline(x=datetime.datetime.strptime(\"10-22-2018\", \"%m-%d-%Y\"), linestyle=':', color=\"grey\", alpha=.5)\n",
" ax.set_xlim(datetime.datetime.strptime(\"01-01-1998\", \"%m-%d-%Y\"),\n",
" datetime.datetime.strptime(\"08-01-2023\", \"%m-%d-%Y\"))\n",
" ax.xaxis.set_major_locator(mdates.YearLocator(2)) \n",
" ax.set_ylabel(\"Monthly average\\nemails\")\n",
" \n",
" ax.grid(visible=True, alpha=.4, linestyle=\":\")\n",
" ax.spines[['right', 'top']].set_visible(False)\n",
" \n",
" start, end = ax.get_ylim()\n",
" ax.yaxis.set_ticks(np.arange(0, end, 5))\n",
" \n",
"ax1.set_xticklabels([])\n",
"ax2.set_xticklabels([])\n",
"ax3.set_xticklabels([])\n",
"\n",
"plt.suptitle(\"Term usage across IETF mailing lists\", fontsize=16)\n",
"\n",
"plt.tight_layout(h_pad=1.2)\n",
"plt.savefig('all_terms.png', dpi=300, bbox_inches='tight')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "bfa1f55d",
"metadata": {},
"source": [
"## Explore which mailing lists use \"master\" the most"
]
},
{
"cell_type": "code",
"execution_count": 52,
"id": "13e29667",
"metadata": {},
"outputs": [
{
"data": {
"image/png": "",
"text/plain": [
"<Figure size 640x480 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"master_names = [c for c in plot_data.columns if re.match(\".* master$\", c) is not None]\n",
"master_data = plot_data[master_names]\n",
"top_master_cols = master_data.sum(axis=0).sort_values(ascending=False)[:30].index\n",
"\n",
"fig = plt.figure()\n",
"ax = fig.add_subplot(1,1,1) \n",
"\n",
"for c in top_master_cols:\n",
" plt.plot(plot_data[c].index, plot_data[c], alpha=.4, label=c)\n",
"plt.axvline(x=datetime.datetime.strptime(\"10-22-2018\", \"%m-%d-%Y\"), linestyle=':', color=\"grey\", alpha=.5)\n",
"\n",
"plt.xlim(datetime.datetime.strptime(\"01-01-1995\", \"%m-%d-%Y\"),\n",
" datetime.datetime.strptime(\"08-01-2023\", \"%m-%d-%Y\"))\n",
"plt.xticks(rotation=45)\n",
"\n",
"plt.ylabel(\"Daily usage\")\n",
"ax.legend(bbox_to_anchor=(1.1, 1.05))\n",
"\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": 53,
"id": "5982a026",
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *\n",
"secdir 2018-07-30 00:00:00\n",
"* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *\n",
"babel 2018-01-24 00:00:00\n",
"After babeld 1.9.0 is out (hopefully this week or next), I'd like to merge\n",
"Matthieu's code for the new source-specific extension into babeld master.\n",
"\n",
"Now:\n",
"\n",
" - draft-ietf-babel-source-specific-02 suggests using sub-TLV number 128,\n",
" which is the first number in the mandatory range;\n",
" - Matthieu's code uses sub-TLV number 250, which, according to\n",
" rfc6126bis Section 5, is reserved for experimental use.\n",
"\n",
"What shall I do in babeld master? I guess I could either request that\n",
"IANA allocate number 128 for draft-ietf-babel-source-specific-02, or use\n",
"an experimental sub-TLV number, or I could simply ignore the IETF and\n",
"squat number 128.\n",
"\n",
"Advice?\n",
"\n",
"-- Juliusz\n",
"---------------------------------\n",
"> - draft-ietf-babel-source-specific-02 suggests using sub-TLV number 128=\n",
",\n",
"> which is the first number in the mandatory range;\n",
"> - Matthieu's code uses sub-TLV number 250, which, according to\n",
"> rfc6126bis Section 5, is reserved for experimental use.\n",
">=20\n",
"> What shall I do in babeld master? I guess I could either request that IA=\n",
"NA\n",
"> allocate number 128 for draft-ietf-babel-source-specific-02, or use an\n",
"> experimental sub-TLV number, or I could simply ignore the IETF and squat\n",
"> number 128.\n",
"\n",
"I would suggest requesting IANA allocate.\n",
"But in looking at https://www.iana.org/assignments/babel/babel.xhtml, I'm c=\n",
"urious...\n",
"Why are we leaping over 4 through 127? The unassigned (which I would interp=\n",
"ret as \"reserved-for-mandatory\") range is 4-223 according to the registry p=\n",
"age. 4 would be the next logical value.\n",
"While asking IANA to do things to the babel registry, does it make sense to=\n",
" ask them to get rid of the current TLV allocations for draft-boutier-babel=\n",
"-source-specific?\n",
"13 Source-specific Update [draft-boutier-babel-source-specific]=20\n",
"14 Source-specific Request [draft-boutier-babel-source-specific]=20\n",
"15 Source-specific Seqno Request [draft-boutier-babel-source-specific]\n",
"\n",
"If I can be of any assistance in dealing with IANA, please let me know. I'v=\n",
"e been getting pretty good at this IANA registry stuff.\n",
"Barbara\n",
"---------------------------------\n",
"* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *\n",
"mipshop 2006-06-06 00:00:00\n",
"I am probably overly critical in my review of this document, for a \n",
"number of reasons, including for instance trying to hold this \n",
"document to the standards of RFC 4101. There are also several nits \n",
"in the review which can be safely ignored, but if clarified will \n",
"improve the document IMO.\n",
"\n",
"Summary: The primary suggestion is to follow 4101 and provide a \n",
"summary of the protocol in an easy to review/analyze format in the \n",
"earlier parts of the document. Next, I have a question on whether \n",
"two seemingly complementary techniques (referring to the use of a \n",
"message-ID and a sequence number field) are necessary for replay \n",
"protection. Vidya and I had a subsequent discussion and the \n",
"conclusion was that there may be an advantage to keeping both \n",
"techniques . I think some of that comes from trying to protect \n",
"against a sub-class of adversaries. Perhaps inclusion of the threat \n",
"model in the early parts of the document and referring to that model \n",
"in the rest of the document might help clarify the purpose of the two \n",
"fields, should they be kept in the final specification. Even \n",
"otherwise, inclusion of a protocol model and a threat model is \n",
"strongly encouraged.\n",
"\n",
"Please see inline for detailed comments.\n",
"\n",
"best regards,\n",
"Lakshminath\n",
"\n",
"\n",
" >Mipshop V. Narayanan\n",
" >Internet-Draft Qualcomm, Inc.\n",
" >Expires: October 28, 2006 N. Venkitaraman\n",
" > Motorola Labs\n",
" > H. Tschofenig\n",
" > Siemens\n",
" > G. Giaretta\n",
" > TILab\n",
" > J. Bournelle\n",
" > GET/INT\n",
" > April 26, 2006\n",
"\n",
"\n",
" > Handover Keys Using AAA\n",
" > draft-vidya-mipshop-handover-keys-aaa-02.txt\n",
" >\n",
" >Status of this Memo\n",
" >\n",
" > By submitting this Internet-Draft, each author represents that any\n",
" > applicable patent or other IPR claims of which he or she is aware\n",
" > have been or will be disclosed, and any of which he or she becomes\n",
" > aware will be disclosed, in accordance with Section 6 of BCP 79.\n",
"\n",
" > Internet-Drafts are working documents of the Internet Engineering\n",
" > Task Force (IETF), its areas, and its working groups. Note that\n",
" > other groups may also distribute working documents as Internet-\n",
" > Drafts.\n",
"\n",
" > Internet-Drafts are draft documents valid for a maximum of six months\n",
" > and may be updated, replaced, or obsoleted by other documents at any\n",
" > time. It is inappropriate to use Internet-Drafts as reference\n",
" > material or to cite them other than as \"work in progress.\"\n",
"\n",
" > The list of current Internet-Drafts can be accessed at\n",
" > http://www.ietf.org/ietf/1id-abstracts.txt.\n",
"\n",
" > The list of Internet-Draft Shadow Directories can be accessed at\n",
" > http://www.ietf.org/shadow.html.\n",
"\n",
" > This Internet-Draft will expire on October 28, 2006.\n",
"\n",
" >Copyright Notice\n",
"\n",
" > Copyright (C) The Internet Society (2006).\n",
"\n",
" >Abstract\n",
"\n",
" > This document describes a AAA-assisted key management protocol to\n",
"\n",
"\n",
"\n",
" > generate a handover key between a mobile node (MN) and an access\n",
" > router (AR) for the purpose of securing FMIPv6 signaling messages.\n",
" > As such, it specifies a message exchange between the MN and the AR\n",
" > and assumptions that must hold in order for this protocol to work.\n",
" > The idea is that the key derived between a mobile node and an access\n",
" > router through this protocol can be used in fast handovers.\n",
"{**\n",
"How is the last sentence different from the first? Can the key be\n",
"used to protect non-FMIPv6 fast handover signaling messages? Perhaps\n",
"that can be clarified with \"fast handovers with protocols other than\n",
"FMIPv6?\"}\n",
"\n",
"\n",
"<snip>\n",
"\n",
" > 1.1. Applicability\n",
"\n",
" > Although this document is focused on FMIPv6, it is applicable to\n",
" > other protocols as well. In the context of FMIPv6, the key derived\n",
" > using this protocol can be used to secure the Fast Binding Update\n",
" > (FBU) sent from the MN to the PAR as specified in FMIPv6 [1]. Other\n",
" > protocols may also use this mechanism as noted below. For instance,\n",
" > CxTP [9] can also use this protocol to derive keys between the AR and\n",
" > MN to secure the CTAR message. Additionally, keys between an MN and\n",
" > a MAP can be derived using this protocol for HMIPv6 [10] operation.\n",
" > This draft, however, does not address any details on how the protocol\n",
" > described in this draft may be used for CxTP or HMIPv6.\n",
"\n",
"A number of acronyms being used for the first time in the document,\n",
"please expand them\n",
"\n",
"\n",
"2. Terminology\n",
"\n",
" > The key words \"MUST\", \"MUST NOT\", \"REQUIRED\", \"SHALL\", \"SHALL NOT\",\n",
" > \"SHOULD\", \"SHOULD NOT\", \"RECOMMENDED\", \"MAY\", and \"OPTIONAL\" in this\n",
" > document are to be interpreted as described in RFC- 2119 RFC2119 [2].\n",
"\n",
" > In addition, this document uses the following terms:\n",
"\n",
" > MN-AAA Handover Master Key (HMK):\n",
"\n",
" > A key that is shared between the Mobile Node and the AAA server,\n",
" > hereby referred to simply as the Handover Master Key. The HMK is\n",
" > never used directly to protect any messages.\n",
"\n",
" > MN-AAA Handover Integrity Key (HIK):\n",
"\n",
" > A key that is shared between the Mobile Node and the AAA server,\n",
" > hereby referred to simply as the Handover Integrity Key. The HIK\n",
" > is derived from the HMK and is used for authenticating the HKReq/\n",
" > HKResp messages between the MN and the AAA server.\n",
"\n",
" > Handover Key (HK):\n",
"\n",
" > Session key used to secure messages between the Mobile Node and\n",
" > Access Router. A given HK between an MN and ARn is termed as HKn.\n",
"\n",
"Secure any messages? Or just fast handover signaling messages?\n",
"\n",
"3. Goals, Assumptions and Requirements\n",
"\n",
" > The document generally follows the key management guidelines\n",
" > documented in [11]. This section describes the goals and assumptions\n",
" > that the protocol design is based on.\n",
"\n",
" > o A major goal of the protocol is to leverage the existence of AAA\n",
" > infrastructure to establish session keys for securing FMIPv6\n",
" > signaling messages. AAA protocols are widely deployed today due\n",
" > to the usage of AAA-based network access authentication. The AAA\n",
" > server is used to authenticate and authorize the MN for fast\n",
" > handovers prior to generation of the handover key. The solution\n",
" > presented in this document relies on the AAA infrastructure to\n",
" > derive and distribute keying material for handover support.\n",
"\n",
"Combining the last two sentences: \"The handover keying protocol\n",
"(HKP) uses the AAA server to authenticate and authorize the MN for\n",
"fasthandovers and uses the AAA protocol for derivation and delivery of\n",
"the keying material.\" Something like that.\n",
"\n",
" > o The protocol must be able to provide separation of the keys used\n",
" > for integrity protection and the derivation of the handover keys.\n",
"\n",
"Does a protocol provide separation of keys?\n",
"\n",
" > For this purpose, two keys are derived from the HMK for a given\n",
" > node - the Handover Integrity Key (HIK) and the actual Handover\n",
" > Key (HK). The former is used to create the MAC (Message\n",
" > Authentication Code) in the signaling messages for the key\n",
" > management protocol described in this document and the latter is\n",
" > used for FMIPv6 signaling protection.\n",
"\n",
"How about this? To avoid reusing the HMK for two distinct purposes,\n",
"namely in a PRF for key derivation and in a MAC for integrity\n",
"protection, two keys are derived from the HMK -- an HIK, that protects\n",
"HKP messages and an HK that protects the FMIPv6 signaling messages.\n",
"\n",
" > o Unique key naming of the keys must be provided. Key naming of the\n",
" > HMK and HIK is provided using the Network Access Identifier (NAI)\n",
" > of the MN. In order to provide identity protection, it may be\n",
" > undesirable to reuse the same NAI as used for network access\n",
" > authentication. For sake of identity protection, it may be\n",
" > desirable to have the AAA server to derive an ephemeral identity\n",
" > at the time of HMK creation that can be used in subsequent\n",
" > creation of handover keys. Such privacy protection mechanisms are\n",
" > outside the scope of this document. The HK is also named using\n",
" > the NAI - however, at some point, a valid MN Care-of-address and\n",
" > an SPI are associated with the HK.\n",
"\n",
"The above is not very clear. Could the identity protection text be\n",
"moved to security/privacy considerations? That's out of scope anyway.\n",
"With that all keys are initially identified with the NAI and the HK will\n",
"be associated with an SPI.\n",
"\n",
"<snip>\n",
"\n",
"\n",
" > 4. Protocol Overview\n",
"\n",
" > This section provides a description of the procedure to obtain the\n",
" > Handover Key (HK). We assume that the MN shares a key, called\n",
" > Handover Master Key (HMK), with the AAA server. For the purpose of\n",
" > the protocol itself, the HMK may simply be a handover specific master\n",
" > key pre-shared between the MN and the AAA server.\n",
"\n",
" > A Handover Integrity Key (HIK) is derived from the HMK at the MN and\n",
" > AAA server. The HIK is used to provide integrity protection to\n",
" > messages exchanged by the MN and the AAA server. Also, the actual\n",
" > Handover Key (HK) shared between the MN and the AR is derived from\n",
" > the HMK. The derivation of these keys is described in Section 6.1.\n",
"\n",
" > Figure 1 depicts the high-level protocol interaction.\n",
"\n",
"\n",
"\n",
"\n",
" > MN AR AAAH Server\n",
" > | HKReq. | |\n",
" > | ---------> | |\n",
" > | | AAA Request |\n",
" > | | ----------------------> |\n",
" > | | AAA Response |\n",
" > | | <----------------------- |\n",
" > | HKResp. | (Keying material) |\n",
" > | <--------- | |\n",
"\n",
"\n",
" > Figure 1: Protocol Operation\n",
"\n",
"Please fix the figures. The lines should go all the way to the\n",
"server, right? Why is the AAA server shown as AAAH?\n",
"\n",
" > The MN, upon attaching to an AR (say AR1) in a given administrative\n",
" > domain, sends a Handover Key Request (HKReq) message to the AR. The\n",
" > MN creates a HKReq message including an NAI-like identifier (that was\n",
" > derived possibly during the time of HMK derivation), a message ID, a\n",
" > sequence number and the care-of-address (CoA). The MN also generates\n",
" > a nonce and includes it in the HKReq message. Further, the MN\n",
" > indicates the PRF algorithm that it chooses to use for key generation\n",
" > in the HKReq message. The MN includes a MAC of the message fields in\n",
" > an MN-AAA MAC Mobility sub-option. Upon receiving the HKReq from the\n",
" > MN, AR1 must first ensure that it has a valid neighbor cache entry\n",
" > for the CoA claimed by the MN. AR1 further verifies the validity and\n",
" > uniqueness of the CoA claimed by the MN. After verifying the\n",
" > validity of the CoA of the MN, AR1 forwards the HKReq message to the\n",
" > AAA Server. The MAC in the HKReq allows the AAA server to\n",
" > authenticate the MN and to perform authorization for fast handoffs\n",
" > before deriving a unique and fresh session key, the Handover Key\n",
" > (HK1).\n",
"\n",
" > HK1 will subsequently be used between the MN and the AR1 after a\n",
" > successful protocol execution. The key derivation procedure for the\n",
" > Handover Key is defined as follows:\n",
"\n",
" > HK1 = gprf+ (HMK, AAA Nonce | MN Nonce | AR1 ID | MN ID | \"Handover\n",
" > Key\"), where | indicates concatenation\n",
"\n",
" > The AR1 ID is the IP address of AR1 as seen by the MN. The MN ID is\n",
" > the NAI of the MN that is sent by the MN in the HKReq message. The\n",
" > AAA and MN Nonces are nonces generated by the AAA server and the MN\n",
" > respectively for the purpose of HK1 derivation. The function gprf+\n",
" > is defined in Section 6.1.\n",
"\n",
" > After successful authentication and authorization, the AAA Server\n",
" > then sends two different parameters to AR1 - one of the parameters\n",
" > includes the key HK1 and the lifetime associated with it. The other\n",
" > parameter, to be sent to the MN, includes the AAA Nonce. The message\n",
" > is protected with the AR1-AAA SA when forwarded to AR1.\n",
"\n",
"A few comments on the previous paragraph:\n",
"\n",
"* how about: the AAA server sends two sets of parameters to AR1: HK1\n",
"and associated lifetime destined to AR1 and the AAA nonce to be sent to\n",
"the MN.\n",
"\n",
"* contains the only reference to AR1-AAA SA. A clarification might be\n",
"in order?}\n",
"\n",
"<snip>\n",
"\n",
"\n",
" > 5.1. Mobility Header Types\n",
"\n",
" > 5.1.1. Handover Key Request (HKReq)\n",
"\n",
"<snip>\n",
"\n",
" > HKReq Fields\n",
"\n",
" > Message Type\n",
"\n",
" > A one octet field indicating the handover key exchange\n",
" > mechanism encoded in the mobility options, the value of which\n",
" > is taken from the IANA Handover Key Exchange Mechanism Type\n",
" > registry. A value of '1' (to be assigned by IANA) indicates\n",
" > AAA-assisted handover key exchange. Other key exchange\n",
" > protocols defined in future may define additional values.\n",
"\n",
" > v\n",
"\n",
" > Verify flag. This is set by the MN to indicate that it may\n",
" > already share a key with the AR.\n",
"\n",
"if v is set, is the MAC option a MUST/SHOULD/MAY ?\n",
"\n",
" > PRF\n",
"\n",
" > A 3 bit field indicating the PRF algorithm that the MN wishes\n",
" > to use for key generation. This is the algorithm the MN\n",
" > proposes to use in the derivation of HIK and HK from the HMK.\n",
" > Currently, the value of 1 is assigned to indicate HMAC-SHA1.\n",
"\n",
" > Reserved\n",
"\n",
" > Set to 0; ignored on reception.\n",
"\n",
" > Message ID\n",
"\n",
" > A two octet random value used to uniquely match requests and\n",
" > responses and identify retransmissions.\n",
"\n",
" > Sequence #\n",
"\n",
" > A two octet unsigned integer used by the AR in combination with\n",
" > the message ID to detect retransmissions and replays.\n",
"\n",
"My guess is only one of the two fields above may be required, but\n",
"more on that later\n",
"\n",
" > Key Care of Address\n",
"\n",
" > Sixteen octet field containing the IPv6 CoA for which the key\n",
" > is requested.\n",
"\n",
" > Mobility Suboptions\n",
"\n",
" > Variable length field of such length that the complete Mobility\n",
" > Header is an integer multiple of 8 octets. This field contains\n",
" > one or more TLV-encoded mobility suboptions. Valid mobility\n",
" > sub-options for this message include the following:\n",
"\n",
" > Handover Nonce Mobility Option (new option defined in\n",
" > section Section 5.2.1 below)\n",
"\n",
" > Mobile Node Identifier Mobility Sub-option, as defined in\n",
" > [3]\n",
"\n",
" > MAC Mobility Option as defined in Section 5.2.2\n",
"\n",
" > Timestamp Mobility Option as defined in Section 5.2.3\n",
"\n",
" > 5.1.2. Handover Key Response (HKResp)\n",
"\n",
" > A handover key response (HKResp) message MUST be sent from the AR to\n",
" > the MN in response to a HKReq message. HKResp MUST carry suboptions\n",
" > appropriate to the key agreement mechanism requested in the HKReq,\n",
" > which the MN can use to derive a handover key.\n",
"\n",
"The second \"MUST\" is vague. Perhaps the requirement can be stated more\n",
"precisely.\n",
"\n",
" > The HKResp message uses the MH Type value TBA2. When this value is\n",
" > indicated in the MH Type field, the format of the Message Data field\n",
" > in the Mobility Header is as follows:\n",
"\n",
"\n",
" > +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+\n",
" > | Message Type |v|PRF|Reserved|\n",
" > +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+\n",
" > | Message ID | Sequence # |\n",
" > +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+\n",
" > | Status Code | Lifetime |\n",
" > +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+\n",
" > | SPI |\n",
" > +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+\n",
" > | |\n",
" > . .\n",
" > . Mobility options .\n",
" > . .\n",
" > | |\n",
" > +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+\n",
"\n",
"\n",
"\n",
" > Figure 3: Handover Key Response\n",
"\n",
" > HKResp Fields\n",
"\n",
" > Message Type\n",
"\n",
" > A one octet field indicating the key exchange mechanism encoded\n",
" > in the mobility options, the value of which is taken from the\n",
" > IANA Handover Key Agreement Mechanism Type registry. A value\n",
" > of '1' (to be assigned by IANA) indicates AAA-assisted handover\n",
" > key exchange. Other key exchange protocols defined in future\n",
" > may define additional values.\n",
"\n",
" > v\n",
"\n",
" > Verified flag. Set by the AR in response to a HKReq with the V\n",
" > bit set, if it verified the request.\n",
"\n",
"Please use the same case for v throughout\n",
"\n",
" > PRF Type\n",
"\n",
" > A 3 bit field indicating the PRF algorithm that the AAA server\n",
" > used for key generation. This is the algorithm used in the\n",
" > derivation of HIK and HK from the HMK. If the AAA server\n",
" > accepts the algorithm proposed by the MN, it sets the value to\n",
" > the same as in the HKReq. If not, the value in this field\n",
" > indicates the algorithm the MN must use to derive the keys.\n",
"\n",
"s/must/MUST\n",
"\n",
" > Currently, the value of 1 is assigned to indicate HMAC-SHA1.\n",
"\n",
" > Reserved\n",
"\n",
" > Set to 0; ignored on reception.\n",
"\n",
" > Message ID\n",
"\n",
" > A two octet random value used to uniquely match requests and\n",
" > responses and identify retransmissions. This value is copied\n",
" > from the corresponding HKReq.\n",
"\n",
" > Sequence #\n",
"\n",
" > A two octet unsigned integer from the matching HKReq that\n",
" > triggered this message.\n",
"\n",
" > Status Code\n",
"\n",
" > One octet status code indicating the status of the request.\n",
"\n",
" > 0 - Success.\n",
"\n",
" > 129 - HKE_NOT_SUPPORTED.\n",
"\n",
" > 130 - CoA_AUTH_FAILED\n",
"\n",
" > 131 - ADMINISTRATIVELY_PROHIBITED.\n",
"\n",
" > 132 - HK_VERIFY_FAILED\n",
"\n",
" > 133 - MN_AR_MAC_REQD\n",
"\n",
" > 134 - INVALID_AUTH_VALUE\n",
"\n",
" > 135 - TIMESTAMP_REQD\n",
"\n",
" > 136 - INVALID_PRF_ALG\n",
"\n",
" > 137 - INVALID_MAC_ALG\n",
"\n",
" > 138 - INVALID_TIMESTAMP\n",
"\n",
"Some error codes are not listed!\n",
"\n",
" > Lifetime\n",
"\n",
" > Lifetime of the handover key, in seconds.\n",
"\n",
"\n",
" > SPI\n",
"\n",
" > Four octet Security Parameter Index for the handover key at the\n",
" > AR. The SPI is used by the MN and the AR to identify the key\n",
" > when the key is used.\n",
"\n",
"The AR picks this and the MN and AR use it to identify the HK,\n",
"right. My recollection from a discussion earlier is that the SPI alone\n",
"is not sufficient to identify the key and instead <AR-ID, SPI> is used\n",
"at the MN and the SPI is used at the AR, right? Where is that\n",
"specified? Perhaps later in the document.\n",
"\n",
"<snip>\n",
"\n",
" > 6. Protocol Details\n",
"\n",
" > The proposed protocol enables a MN to derive session keys (HKs) with\n",
" > an access router. This section describes the key derivation process\n",
" > and the processing rules for the mobile node, the access router and\n",
" > the AAA server.\n",
"\n",
" > 6.1. Key Derivation\n",
"\n",
" > As mentioned earlier, the MN and the AAA server share a key for the\n",
" > purpose of handover key derivation, called the Handover Master Key\n",
"\n",
"\n",
"\n",
" > (HMK). The HMK may be a pre-shared key or may be derived between the\n",
" > MN and AAA server by means outside the scope of this document. The\n",
" > HMK MUST be at least 64 bytes in length and SHOULD NOT be generated\n",
" > from a password. The HMK MUST NOT be directly used to protect any\n",
" > data or derive handover keys. The HMK is only used to derive the\n",
" > Handover Integrity Key (HIK) and the Handover Key (HK).\n",
"\n",
"Some of the above is repetition; is it necessary?\n",
"Further, there is no need to expand all the key names etc.\n",
"Not sure I understand the notion of not directly using HMK to derive\n",
"handover keys, it is after all used to derive the HK!\n",
"\n",
" > The HMK must be updated periodically to allow the derivation of\n",
" > cryptographically strong handover keys.\n",
"\n",
"Small must and how frequently? This belongs in SEC considerations\n",
"on the assumptions about the HMK.\n",
"\n",
" > The MN and the AAA server MUST derive a Handover Integrity Key (HIK)\n",
" > and a Handover Key (HK) from the Handover Master Key. The PRF used in\n",
" > the derivation must be communicated by the MN to the AAA server in\n",
" > the HKReq message. The AAA server may choose to mandate a different\n",
" > PRF type by indicating so in the HKResp message. The key derivation\n",
" > follows the procedure explained in this section.\n",
"\n",
" > For the purpose of deriving keys of variable length that depend on\n",
" > the PRF type and MAC algorithm used, this document uses an adaptation\n",
" > of prf+ defined in RFC4306 [5]. This function is termed \"Generalized\n",
" > prf+ (gprf+)\" and is defined as follows (where | indicates\n",
" > concatenation).\n",
"\n",
" > gprf+ (K, S) = T1 | T2 | T3 | T4 ...\n",
"\n",
" > where:\n",
"\n",
" > T1 = PRF (K, S | Y)\n",
"\n",
" > T2 = PRF (K, T1 | S | Y+1)\n",
"\n",
" > T3 = PRF (K, T2 | S | Y+2)\n",
"\n",
" > T4 = PRF (K, T3 | S | Y+3)\n",
"\n",
" > continuing as needed to compute all required keys. The keys are\n",
" > taken from the output string without regard to boundaries, depending\n",
" > on the length of the key required. These lengths are dependent on\n",
" > the MAC algorithm and the PRF chosen. For instance, if the HIK\n",
" > required is a 256-bit key and the PRF used yields 160-bit keys, the\n",
" > HIK will come from T1 and the beginning of T2. Y is a two-byte\n",
" > hexadecimal value that may differ in different uses of gprf+. Each\n",
" > key derivation procedure below explains the appropriate values of K,\n",
" > S and Y.\n",
"\n",
"The explanation about the length of the key required is repetitious\n",
"and I think that the example is not needed.\n",
"\n",
"s/byte/octet\n",
"\n",
"If Y is different in different uses of the gprf+, shouldn't it be a\n",
"parameter of gprf+?\n",
"\n",
" > 6.1.1. Handover Integrity Key Derivation\n",
"\n",
" > The gprf+ is used as follows to derive the HIK.\n",
"\n",
"\n",
"\n",
" > HIK = gprf+ (HMK, \"Handover Integrity Key\")\n",
"\n",
" > where, the string Handover Integrity Key is a 22-character ascii\n",
" > string with no null termination. The value of Y is the Message ID\n",
" > (in hex) from the HKReq sent by the MN. The Message ID is used in\n",
" > the derivation to allow the values of HIK to change upon every\n",
" > instantiation of the gprf+ for this purpose. Since the HIK\n",
" > derivation does not involve nonces or other changing parameters, the\n",
" > Message ID is included to avoid the use of the same HIK for a long\n",
" > time. This really is provided to allow implementations that don't\n",
" > refresh the HMK appropriately to still be able to have changing HIKs.\n",
" > If the HMK is refreshed periodically, there is not a need to derive a\n",
" > new HIK at every HKReq/HKResp exchange. However, at the time of this\n",
" > writing, it is felt that it will not be uncommon to configure a HMK\n",
" > and not change it for a long period of time.\n",
"\n",
" > The actual length of the HIK required is determined by the PRF used\n",
" > in the derivation. Note that the length of the HIK must be\n",
" > sufficient for the MAC algorithm used. Hence, the choice of PRF must\n",
" > be done such that it results in a sufficiently long key that can be\n",
" > used by the MAC algorithm.\n",
"\n",
"The above paragraph is not required. First, the length semantics\n",
"are clear already. The PRF algorithm's output length is not relevant\n",
"since prf+ is being used.\n",
"\n",
"6.1.2. Handover Key Derivation\n",
"\n",
" > The gprf+ is used as follows to derive the HK (where | indicates\n",
" > concatenation).\n",
"\n",
" > HK = gprf+ (HMK, \"MN Nonce | AAA Nonce | MN ID | AR ID | \"Handover\n",
" > Key\")\n",
"\n",
"there is an extra \" above\n",
"\n",
" > where, the string Handover Key is a 12-character ascii string with no\n",
" > null termination. The MN Nonce is generated by the MN and\n",
" > communicated to the AAA server in the HKReq message. The AAA nonce\n",
" > is generated by the AAA server and communicated to the MN in the\n",
" > HKResp message. The MN ID is the NAI of the MN and the AR ID is the\n",
" > IP address of the AR as seen by the MN. The value of Y for the HK\n",
" > derivation is set to 0x0000.\n",
"\n",
" > The actual length of the HK required is determined by the PRF used in\n",
" > the derivation. Note that the length of the HK must be sufficient\n",
" > for the MAC algorithm used in the protection of FMIPv6 signaling.\n",
" > Hence, the choice of PRF must be done such that it results in a\n",
" > sufficiently long key that can be used by the MAC algorithm.\n",
"\n",
"Same as before: length of HK is dependent on the MAC algorithm used\n",
"for FMIPv6 and that's all. It does not depend on the PRF choice\n",
"\n",
" > More details on the process leading to HK derivation and its usage\n",
" > can be found in sections below.\n",
"\n",
"\n",
"\n",
" > 6.2. Mobile Node Considerations\n",
"\n",
" > After attaching to an AR, an FMIPv6 capable mobile node SHOULD\n",
" > immediately proceed to obtain a session key between itself and its\n",
" > current AR.\n",
"\n",
" > 6.2.1. Sending Handover Key Request Messages\n",
"\n",
" > In order to derive a shared key with the AR, the MN MUST create a\n",
" > HKReq with a unique, random Message ID and a sequence number also\n",
" > starting at a random value. The MN MUST also derive a fresh MN nonce\n",
" > that will be used in HK derivation and include it in the HKReq. For\n",
" > subsequent retransmissions, the sequence number MUST be incremented\n",
" > by 1 and the Message ID MUST remain the same.\n",
"\n",
"It is not clear why the Message ID and the Sequence number are\n",
"needed, but perhaps that is described later? A one-liner here might be\n",
"useful.\n",
"Fresh nonce? Do you mean random nonce?\n",
"\n",
" > In order to obtain absolute replay protection, the MN SHOULD use the\n",
" > Timestamp mobility option defined in Section 5.2.3.\n",
"\n",
"\"Absolute\" replay protection? Please explain what, if any, replay\n",
"protection is offered with other mechanisms and explain why TSs are\n",
"required to claim replay protection. Or, perhaps this can be moved to\n",
"later.\n",
"\n",
" > The MN MUST indicate the PRF algorithm it used for HIK derivation and\n",
" > intends to use for HK derivation in the PRF field of the HKReq\n",
" > header. A value of 1 indicates HMAC-SHA1. That is the only value\n",
" > presently defined in this document.\n",
"\n",
"Why refer to HMAC-SHA1 here? If that is replaced in the future,\n",
"it would need to be done in multiple places.\n",
"\n",
" > The HKReq sent by the MN MUST include the MN-AAA MAC Mobility Sub-\n",
" > Option. The AUTH Value in the MN-AAA MAC option shall be calculated\n",
" > as follows:\n",
"\n",
" > AUTH = prf(HIK, MH Data)\n",
"\n",
"You mean MAC and not PRF?\n",
"\n",
" > The prf used is indicated in the Algorithm Type included in the MAC\n",
" > Mobility Option. At the time of writing of this document, the only\n",
" > value defined is 1 for HMAC-SHA1.\n",
"\n",
"\n",
"No need for the second sentence. In the first sentence, replace the\n",
"prf with MAC.\n",
"\n",
" > MH Data is the content of the Mobility Header up to and including the\n",
" > Algorithm Type field of this option.\n",
"\n",
" > The MN should send a new HKReq before the expiry of the lifetime of\n",
" > the key obtained via the pervious HKReq. Once a MN obtains a new\n",
" > key, it MUST discard the old key and use the new key for\n",
" > authenticating the FBU.\n",
"\n",
" > If the MN does not receive a response after a configured timeout\n",
" > (HKEY_REQ_TOUT), it SHOULD retransmit the request for a maximum of N\n",
" > HO_KEYTRIES. In each of the retry attempts, the MN MUST use the same\n",
" > message ID. The default value of N_HO_KEYTRIES is 3.\n",
"\n",
"\n",
"Doesn't say anything about incrementing SEQ# in the retries; perhaps\n",
"that comes later?\n",
"\n",
"\n",
"\n",
"6.2.2. Receiving and Processing Handover Key Response Messages\n",
"\n",
" > Upon receiving a successful HKResp message from the AR, the MN MUST\n",
" > ensure that the message contains the MN-AR MAC mobility option. If\n",
" > not, it MUST silently discard the message. The MN MUST ensure that\n",
" > the Message ID matches with that of the corresponding HKReq. If\n",
" > there is a mismatch, it MUST drop the packet. The MN MUST compute\n",
" > the handover key using the keying material contained in the HKResp\n",
" > message. The key is computed as described in Section 6.1. It is\n",
" > repeated here for completion.\n",
"\n",
"Not required at all, I'd delete the derivation thing. Why repeat\n",
"after only a few pages of the spec? Furthermore, if something changes,\n",
"the updates may make this inconsistent in the document.\n",
"\n",
" > HK = gprf+ (HMK, AAA Nonce| MN Nonce | AR ID | MN ID | \"Handover\n",
" > Key\"), where | indicates concatenation\n",
"\n",
" > The AR ID is the IP address of the AR. The MN ID is the NAI of the\n",
" > MN that is sent by the MN in the HKReq message. The AAA Nonce is a\n",
" > nonce generated by the AAA server and included in the HKResp received\n",
" > by the MN. The MN Nonce is the nonce generated by the MN and\n",
" > included in the HKReq message. The PRF used is indicated by the PRF\n",
" > field of the HKResp message.\n",
"\n",
" > The MN MUST verify the AUTH Value in the MN-AR MAC mobility option\n",
" > using the HK derived. The MAC algorithm used is the one specified in\n",
" > the Algorithm Type field of the MN-AR MAC mobility sub-option. If\n",
" > the MAC algorithm is not supported by the MN, it MUST discard the\n",
" > message. If the AUTH Value verification fails, the MN MUST silently\n",
" > discard the message.\n",
"\n",
" > Upon successful processing of the HKResp and derivation of the valid\n",
" > HK, the MN MUST store the SPI and lifetime associated with the key,\n",
" > as sent in the HKResp.\n",
"\n",
"At this point the reader does not know what's in the HKResp\n",
"messages, so the above description seems to be in vacuum. We know only\n",
"that there is a MAC and the associated verification rules, and the above\n",
"line says something about SPI and lifetime etc., without any explanation\n",
"of the use of those fields.\n",
"\n",
"6.2.2.1. Error Processing\n",
"\n",
" > If the MN receives a HKResp with the Code set to\n",
" > ADMINISTRATIVELY_PROHIBITED, the MN MUST NOT send any more HKReq\n",
" > messages via that AR. If the MN receives a HKResp with the Code set\n",
" > to HKE_NOT_SUPPORTED, it SHOULD use a different key exchange protocol\n",
" > to derive the handover key. If the code is set to COA_AUTH_FAILED,\n",
" > it MAY retransmit the HKReq after a random time greater than\n",
" > HK_RETRY_INTERVAL. However, the MN MUST NOT retransmit the HKReq\n",
" > more than N HO_KEYTRIES. If the code is set to MN_AAA_AUTH_REQD, the\n",
" > MN MUST send a new HKReq with the MN AAA Authentication Option\n",
" > included. If the code is set to INVALID_AUTH_VALUE, the MN MUST NOT\n",
" > send any more HKReq messages via that AR. If the code is set to\n",
" > TIMESTAMP_REQD, the MN SHOULD send another HKReq using the Timestamp\n",
" > mobility option. If the code is set to INVALID_TIMESTAMP, the MN\n",
" > SHOULD send another HKReq with the adjusted timestamp value. If the\n",
"\n",
"AUTH and MAC are interchangeable and I don't like that confusion.\n",
"It appears that you are saying an MN_AAA_AUTH_REQD might be sent, but it\n",
"is not clear when the AAA server might send that error code since the\n",
"rule might be that it MUST discard messages without MACs silently!\n",
"Actually that is not clear either! I'd definitely say that the AAA\n",
"server MUST NOT send INVALID_AUTH_VALUE, as that introduces a\n",
"vulnerability: the AAA server becomes an Oracle for an attacker.\n",
"\n",
"\n",
" > code is set to INVALID_PRF_ALG, the MN MAY send another HKReq with\n",
" > the PRF Algorithm specified in the HKResp message. If the code is\n",
" > set to INVALID_MAC_ALG in the MN-AAA MAC sub-option, the MN MAY send\n",
" > another HKReq with the Algorithm Type value set to that found in the\n",
" > corresponding field of the HKResp message.\n",
"\n",
"6.2.3. Returning to an Access Router\n",
"\n",
" > When a MN attaches to an AR with which the MN believes it has a\n",
" > shared Key (for example the MN has an unexpired key it obtained when\n",
" > it was Previously associated with the access router), the MN MAY\n",
"\n",
"s/Previously/previously\n",
"\n",
" > request that the AR utilize the Key as long as the key has not\n",
"\n",
"s/Key/key\n",
"\n",
" > expired. If the MN wishes to re-use the key, it MUST do so only\n",
" > after verifying that by sending a HKReq with the verify flag set.\n",
" > The message ID field in the HKReq is a freshly generated random\n",
" > number. However the sequence number value in this HKReq MUST be\n",
" > greater than the sequence number in previous HKReq sent to the AR\n",
" > corresponding to the key. The MN MUST include an MN-AR MAC mobility\n",
" > option in HKReq with verify flag set. The MN-AR MAC Option MUST be\n",
" > the last option included in the HKReq. The verify procedure serves\n",
" > two purposes: (1) It is used to verify that the AR still has the key\n",
" > and allows use of the key until the lifetime of the key and (2)It\n",
" > enables the MN CoA to be bound to the key. If the verify request is\n",
" > not successful the MN SHOULD create a new MN-AR handover key by\n",
" > sending a HKReq with the MN-AAA Auth option .\n",
"\n",
"Several problems with the text above:\n",
"1. The text about message IDs and sequence numbers is not clear,\n",
"especially the part about message IDs. For the SEQ part, I am still\n",
"looking for a justification of the need for it, but perhaps it is\n",
"elsewhere in the document.\n",
"2. I guess the MAC is for PoP of the key; perhaps that might be stated.\n",
"3. It is not clear how the MN CoA is bound to the key, perhaps it is to\n",
"non FMIPv6 challenged folks. A sentence on it might help!\n",
"\n",
"\n",
" > If the MN receives a HKResp with the code set to HK_VERIFY_FAILED, it\n",
" > SHOULD send another HKReq with the MN-AAA MAC Mobility sub-option to\n",
" > obtain a new handover key via the AAA server.\n",
"\n",
"I see references to MN-AAA Auth option and MN-AAA MAC Mobility\n",
"sub-option; I guess they are different. Please clarify.\n",
"\n",
" > 6.3. Access Router Considerations\n",
"\n",
" > If the HKReq message does not contain the MN ID Mobility Option, the\n",
" > AR MUST silently discard the message. If the HKReq does not have the\n",
" > verify bit set, the AR does the following.\n",
"\n",
" > If the HKReq message does not contain the MN-AAA MAC Mobility sub-\n",
" > option, it MUST silently discard the message.\n",
"\n",
" > The AR MUST first determine if it has a pending request from the MN\n",
" > with the same message id. If so and if the AR has already received a\n",
" > AAA response corresponding to the HKReq, the AR SHOULD retransmit the\n",
" > HKResp to the MN. If the received message has the same message ID\n",
" > and the sequence number as before, the AR MUST drop the packet. For\n",
" > further protection from replays, the rate of retransmissions of\n",
" > responses to MN MUST not be more than a preconfigured RETRANS_RATE.\n",
" > If the AR already forwarded this message to the home AAA Server but\n",
" > has not yet received a response from AAA server, the AR MUST silently\n",
"\n",
"\n",
"\n",
" > discard the retransmitted request from the MN. Note that the AAA\n",
" > protocol should independently perform its own retransmission. If\n",
" > this is a new HKReq, the AR MUST check to determine if the MN-AAA MAC\n",
" > Mobility sub-option has been included in this message by the MN. If\n",
" > it has been included, then the AR MUST forward the request to the\n",
" > home AAA Server.\n",
"\n",
"Usually retransmission means (atleast in certain protocols that I am aware of)\n",
"that the sender doesn't change anything in the message. In this case, the MN\n",
"is really preparing the message again and sending it.\n",
"\n",
"I believe that only one of message-id and seq fields is needed.\n",
"\n",
"With your current semantics, the processing rule about the same msg ID\n",
"and seq number comes before the same message ID, and \"response\n",
"available\" case\n",
"\n",
" > If the AR expects timestamp-based replay protection, it MUST return a\n",
" > HKResp with the error code set to \"TIMESTAMP_REQD\".\n",
"\n",
" > If it is a new HKReq, the AR SHOULD send a request to the AAA server\n",
" > only after successful validation of the CoA. Section 7.4 provides\n",
" > additional discussion on the issues associated with address\n",
" > validation and some possible options for address validation. If the\n",
" > AR fails to verify the CoA, it MUST send a HKResp with the code set\n",
" > to CoA_AUTH_FAILED.\n",
"\n",
"Perhaps a sentence on what validation of the CoA would mean might be\n",
"useful here. I am not sure it's a good idea to report AUTH failures.\n",
"\n",
" > The AAA message is constructed using the appropriately defined\n",
" > attributes (illustrated in Appendix A for RADIUS and Appendix B for\n",
" > Diameter). While creating the AAA request message, the AR MUST\n",
" > include the NAS-IP-Address AVP in the AAA message (e.g. RADIUS\n",
" > Access Request) sent to the AAA server. The AR MUST use its IP\n",
" > address as seen by the MN in this AVP. This will be used as the AR\n",
" > ID in deriving the handover key.\n",
"\n",
" > The domain name in the NAI is used to determine the AAA server to be\n",
" > contacted.\n",
"\n",
" > If the AR receives a successful AAA response (e.g., RADIUS Access\n",
" > Accept) message from the AAA server, it MUST store the handover key\n",
" > received from the AAA server along with the CoA and MN ID and index\n",
" > it additionally with an SPI. The AR MUST send the SPI, AAA Nonce and\n",
" > lifetime in the RADIUS message in the HKResp message to the MN. The\n",
" > AR MUST include a MAC of the message created using the HK in the\n",
" > MN-AR MAC Mobility Sub-Option carried in the HKResp message.\n",
"\n",
" > The AUTH Value in the MN-AR MAC option is created as follows.\n",
"\n",
" > AUTH = prf (HK, MH Data)\n",
"\n",
" > The prf used is indicated in the Algorithm Type included in the MAC\n",
" > Mobility Option. At the time of writing of this document, the only\n",
" > value defined is 1 for HMAC-SHA1.\n",
"\n",
" > MH Data is the content of the Mobility Header up to and including the\n",
" > Algorithm Type field of this option.\n",
"\n",
"\n",
"Starting at the \"The AUTH value, this seems like repetition from\n",
"earlier and I recall stating several issues with it. First, I'd suggest\n",
"deleting this. If not, please see below:\n",
"\n",
"1. Please replace AUTH with MAC. That'd result in changes in a few\n",
"other places.\n",
"2. Replace prf with MAC.\n",
"3. No need to specify HMAC-SHA1 here again.\n",
"\n",
"\n",
" > If the AR received an MN-AAA MAC sub-option from the AAA server, it\n",
"\n",
"\"If\": is this really an option? Is the MN-AAA MAC an option?\n",
"\n",
"\n",
" > MUST include that in the HKResp to the MN.\n",
"\n",
" > If the AR receives an unsuccessful AAA response (e.g., RADIUS Access\n",
" > Reject) message from the AAA server, the AR MUST send an appropriate\n",
" > error code to the MN in the HKResp message.\n",
"\n",
" > The AR SHOULD buffer the information to be sent to the MN for a\n",
" > maximum value of HKEY_REQ_TOUT, so that it can be retransmitted upon\n",
" > receiving a HKReq with the same Message ID and incremented sequence\n",
" > number.\n",
"\n",
" > If an AR receives a AAA response message corresponding to a MN that\n",
" > is no longer connected to it, the AR SHOULD silently discard it.\n",
"\n",
" > After retransmission timeout, if the AR does not receive a response\n",
" > from the AAA server, it MUST remove all state associated with the MN\n",
" > and MUST NOT send a response to the MN.\n",
"\n",
" > When the neighbor cache entry for the CoA expires, the AR MUST\n",
" > disassociate the key and the corresponding CoA. When the lifetime of\n",
" > the key expires, the AR should remove the SPI.\n",
"\n",
" > 6.3.1. Returning Mobile Node\n",
"\n",
" > If a mobile node believes that it shares a handover key with a valid\n",
" > lifetime with the AR, it may send a HKReq with the 'V' bit set. If\n",
" > so, the AR MUST use the MN ID to lookup the MN and obtain the key.\n",
"\n",
"\n",
"s/MN ID/SPI\n",
"\n",
" > If a key is present the AR MUST verify the AUTH value in the MAC\n",
" > option. If it is valid, it MUST verify that the sequence number in\n",
" > the HKReq is greater than the value in the sequence number field in\n",
" > the previously received HKReq from the MN for the same key. This\n",
" > ensures that the message is not a replay of a previous message with a\n",
" > verify bit set. If the sequence number check fails the AR MUST\n",
" > silently discard the message.\n",
"\n",
"Is the assumption on replay protection using the \"same msg-id, diff\n",
"seq number\" that the AR stores the entire message that it has forwarded\n",
"to the AAA server and compares it or does it just verify the msg-id and\n",
"respond back?\n",
"\n",
" > If the sequence number check is successful and computed MAC matches\n",
" > the AUTH value included in the HKReq message, the AR SHOULD verify\n",
" > that the IP address is valid and is not claimed by any other node.\n",
" > If that procedure succeeds, the AR MUST send a response with the\n",
" > verified bit set including a MAC of the response. Also, the AR MUST\n",
" > record the CoA of the MN as the IP address associated with the\n",
" > handover key for that MN, in addition to the MN ID (NAI). If the\n",
" > MN-AR MAC Mobility Sub-Option is not present in the HKReq in this\n",
" > case, the AR MUST drop the message and return a HKResp with the error\n",
" > code set to \"MN_AR_MAC_REQD\". Also, if the option was included but\n",
" > the AR was unable to verify the MAC, it MUST drop the message and\n",
" > return a HKResp with the error code set to \"INVALID_AUTH_Value\". The\n",
"\n",
"MAC_REQD is ok, but INVALID_AUTH MUST NOT be sent. BTW, why is it\n",
"invalid AUTH and not MAC? I may have asked that Q already.\n",
"\n",
"\n",
" > rate of transmissions of responses to MN MUST not be more than a\n",
" > preconfigured TRANS_RATE. If the AR does not have the key and if the\n",
" > HKReq does not have the MN-AAA MAC Option, it MUST drop the message\n",
" > and return a HKResp with the error code set to \"HK_VERIFY_FAILED\".\n",
"\n",
"The HK_VERIFY as useful as it might be, should be generic and should\n",
"not indicate whether the MAC failed, but instead should say something\n",
"like the AR wants to force the MN do full authentication\n",
"\n",
" > If the AR does not have the key and the HKReq included the MN-AAA MAC\n",
" > Mobility Sub-Option, the AR MAY process the HKReq as though the 'V'\n",
" > bit was not set.\n",
"\n",
" > 6.4. AAA Server Considerations\n",
"\n",
" > A description of the actual AAA attributes is included in Appendix A\n",
" > and Appendix B. The text in the Appendices are provided for\n",
"\n",
"s/text/attributes perhaps\n",
"\n",
" > illustration only and these are expected to be specified in separate\n",
" > documents. This section provides a brief description of the\n",
" > operation of the AAA server for this protocol. The discussion is\n",
" > explained with RADIUS as the example AAA protocol, but applies\n",
" > equally to Diameter as well.\n",
"\n",
" > If the MN cannot be authenticated by the AAA server, it MUST silently\n",
" > discard the HKReq message. If authorization failed for the MN to use\n",
" > FMIPv6 at the AR it is visiting, the AAA server MUST return a\n",
" > response with the code set to ADMINISTRATIVELY_PROHIBITED.\n",
"\n",
" > If the AAA server expects to use a PRF other than the one indicated\n",
" > in the HKReq message, it MUST return an error set to INVALID_PRF_ALG\n",
" > and set the PRF field in the HKResp to indicate the algorithm that\n",
" > must be used. The AAA server, in this case, MUST include an MN-AAA\n",
" > MAC option with the AUTH Value computed using the HIK.\n",
"\n",
"So I guess the MN-AAA MAC doesn't need to be present all the time?\n",
"That's not very clear\n",
"\n",
" > The AAA server MUST derive a Handover Integrity Key (HIK) from the\n",
" > Handover Master Key (HMK) for the MN as specified in Section 6.1.\n",
"\n",
" > If the AAA server is expecting timestamp-based replay protection in\n",
" > the HKReq, it MUST send an error set to TIMESTAMP_REQD in response to\n",
" > an HKReq that does not contain the Timestamp mobility option. If the\n",
" > HKReq contains the Timestamp mobility option, it MUST be processed in\n",
" > accordance with Section 5.2.3. If the timestamp sent by the MN does\n",
" > not match its own timestamp, the AAA server MUST send an error with\n",
" > the code INVALID_TIMESTAMP and include its timestamp in accordance\n",
" > with Section 5.2.3. The AAA server, in this case, MUST include an\n",
" > MN-AAA MAC option with the AUTH Value computed using the HIK.\n",
"\n",
" > Upon receiving a handover key request, the AAA server MUST verify\n",
" > that the MN-AAA MAC Mobility Sub-Option is present in the message.\n",
" > If it is absent, the AAA server MUST silently discard the message.\n",
" > Otherwise, the AAA server MUST verify the AUTH Value in the option.\n",
" > If it is invalid, the AAA server MUST silently discard the message.\n",
"\n",
"\n",
"\n",
" > If it is valid, the AAA server must proceed with the handover key\n",
" > generation described in Section 6.1. If the MAC Algorithm used by\n",
" > the MN is unacceptable, the AAA server SHOULD return an error of type\n",
" > INVALID_MAC_ALG, including an MN-AAA MAC option with the algorithm\n",
" > set to the desired value and the AUTH Value computed using the HIK.\n",
"\n",
" > If the NAS-IP-Address AVP was not included in the request, the AAA\n",
" > server MUST return an error to the AR, indicating that the NAS-IP-\n",
" > Address is required.\n",
"\n",
" > The AAA server MUST send a response back to the AR, including the AAA\n",
" > Nonce as well as the derived HK. Note that the AAA protocol is\n",
" > expected to provide its own security between the AR and the AAA\n",
" > server for purposes of encrypting the HK. The AAA server SHOULD\n",
" > include a lifetime for the HK in the RADIUS Access Accept message.\n",
" > The AAA server is, however, not required to store the key or the\n",
" > lifetime.\n",
"\n",
" > 6.5. Indirect MN-AR Handover Key Exchange\n",
"\n",
" > The MN may wish to derive a handover key with an AR when it is not\n",
" > directly attached to that AR. This may happen in case the MN is\n",
" > using FMIPv6 service with pAR as its anchor (while moving rapidly\n",
" > across ARs, for instance) and its handover key with the pAR is about\n",
" > to expire. In this case, the MN may need to refresh the key via the\n",
" > nAR it is attached to.\n",
"\n",
" > To establish a new handover key with an AR, the MN simply sends HKReq\n",
" > message destined to that AR. The CoA for such indirect refreshes\n",
" > SHOULD be set to NULL. If the CoA is non-NULL, the pAR MUST check if\n",
" > the CoA provided in the HKReq is the same address that is tied to the\n",
" > NAI provided by the MN. If that is not the case the pAR MUST reject\n",
" > the HKReq. If the checks are valid, the pAR contacts the AAA server\n",
" > to establish a handover key as described before.\n",
"\n",
"\n",
" > 7. Security Considerations\n",
"\n",
" > This section describes the security considerations for the protocol\n",
" > for establishing handover keys specified in this document. The\n",
" > messages described in this document are intended to allow the\n",
" > establishment of a security association between the mobile node and\n",
" > access router for fast handoff purposes. The protocol is loosely\n",
" > based on the Mobile IP-AAA model [12] where the MN-HA security\n",
" > association is derived using a AAA server.\n",
"\n",
" > The handover key protocol described in this document transports the\n",
" > nonce from the AAA server to the MN and the MN derives its own\n",
"\n",
"\n",
"\n",
" > handover key using the nonce and other parameters.\n",
"\n",
" > The proposed protocol uses an NAI-like identity of the MN as the\n",
" > identity to derive the handover keys between the MN and AR. This\n",
" > protocol does not provide active or passive user identity\n",
" > confidentiality. If such confidentiality is desired, a service\n",
" > specific identity must be derived as part of the HMK bootstrapping\n",
" > procedure.\n",
"\n",
"7.1. Strength of the HMK\n",
"\n",
" > The protocol relies on the HMK shared between the mobile node and the\n",
" > AAA server for handover key derivation. It also relies on the\n",
" > security of the AAA protocol (RADIUS or Diameter) used between the AR\n",
" > and the AAA server for HK distribution to the AR.\n",
"{**What AAA channel properties do you require? Confidentiality,\n",
"integrity and replay protection?}\n",
"\n",
" > The Security Associations resulting from use of this protocol do not\n",
" > offer any higher level of security than what is already implicit in\n",
" > use of the AAA Security Association between the mobile node and the\n",
" > AAA server. In order to deny any adversary the luxury of unbounded\n",
" > time to analyze and break the HMK, it must be refreshed periodically.\n",
" > The provisioning and refreshing of the HMK in the MN and AAA server\n",
" > is outside the scope of this document.\n",
"\n",
"7.2. Strength of the HIK and HK\n",
"\n",
" > The protocol allows the derivation of the Handover Integrity Key\n",
" > (HIK) and the Handover Key (HK) from the HMK. The HIK is shared\n",
" > between the MN and the AAA server and used in integrity protection of\n",
" > the HKReq message and in implicit authentication of the MN. The AUTH\n",
" > value in the MAC option of the message is calculated using the HIK,\n",
" > allowing integrity protection. By verifying proof of possession of\n",
" > the valid HIK, the MN is authenticated by the AAA server.\n",
"\n",
" > The Handover Key (HK) is shared between the MN and the AR and is used\n",
" > in integrity protection of messages between the MN and the AR. The\n",
" > AR uses the HK to compute the AUTH value in the HKResp message to the\n",
" > MN. Also, when the MN sends a HKReq to the AR with the MN-AR MAC\n",
" > Mobility Sub-Option, it uses the HK to derive the AUTH value in it.\n",
" > Subsequently, when the MN and AR exchange FMIPv6 signaling messages\n",
" > (FBU/FBAck), the HK is used to protect the signaling.\n",
"\n",
"The HK is known to the AAA server and the AAA server can spoof any\n",
"of these messages. In other words, there is no MSK-TSK like derivation\n",
"procedure here. Do you guys want to talk about that?\n",
"\n",
" > This protocol allows the PRF used for key derivation to be indicated\n",
" > by the MN in the HKReq message. Also, the AAA server may choose to\n",
" > deny the usage of the chosen PRF by specifying a different PRF in the\n",
" > HKResp messages. Currently, HMAC-SHA1 is the only PRF for which a\n",
" > value has been defined in this document. Future documents may define\n",
" > more PRF types and values. The choice of the PRF must be done in\n",
"\n",
"\n",
"\n",
" > keeping with the security properties of the desired key and the\n",
" > desired level of security. Also, the MAC algorithm used in the\n",
" > creation of the AUTH value must be taken into consideration to\n",
" > determine the length of the HK and the HIK required.\n",
"\n",
"7.3. Replay Protection\n",
"\n",
" > The proposed protocol uses a sequence number in combination with a\n",
" > message ID to detect retransmissions and replays at the AR. It also\n",
" > allows the use of timestamp based absolute replay protection between\n",
" > the MN and the AAA server. Using the sequence number alone provides\n",
" > limited replay protection. Replay protection is provided as long as\n",
" > there is state corresponding to an MN at the AR. An attacker may,\n",
" > however, cache a HKReq message sent on the link between an MN and AR\n",
" > and replay it at a sufficiently later time when the AR has no state\n",
" > for that MN. In this case, the AR will end up sending a AAA request\n",
" > to the AAA server. The HKReq will be successfully processed by the\n",
" > AAA server in this case, since the authentication data will be valid\n",
" > (as the attacker has not modified anything in the message). The AAA\n",
" > server will create a HK corresponding to this message and will\n",
" > provide it to the AR. However, it must be noted that the MN cannot\n",
"\n",
"The MN or the attacker?\n",
"\n",
" > obtain the HK, because, without the HMK, it cannot derive the key\n",
" > from the nonce. Hence, this scenario does not lead to an adversary\n",
" > using FMIP services on the AR. But, it does result in some minimal\n",
" > resource consumption at the AAA server (for computing the handover\n",
" > key). The assumption from an accounting perspective here is that\n",
" > accounting at the AAA server will not be triggered until the MN\n",
" > actually starts using the FMIP service (in other words, until the MN\n",
" > sends an FBU to the AR). If accounting for FMIPv6 is started based\n",
" > on when the handover key is derived, this issue could result in an MN\n",
" > getting charged for FMIP service due to an adversary. To provide\n",
" > absolute replay protection, the use of a timestamp-based approach\n",
" > using the Timestamp mobility option is recommended.\n",
"\n",
"I would have thought the need for message-ID *and* the seq #\n",
"would've been explained here. I just strongly to remove the seq#. It\n",
"has very limited use for an interestingly limited adversary.\n",
"\n",
" > 7.4. IP Address Authorization\n",
"\n",
" > For FMIPv6 operation, the access router must ensure that a mobile\n",
" > node cannot redirect traffic belonging to any other node. For this\n",
" > purpose, the access router must bind the handover key of a mobile\n",
" > node to its care-of-address. The AR must ensure that the CoA claimed\n",
" > by the MN does not belong to any other node.\n",
"\n",
" > IP address authorization may be done in different ways in different\n",
" > networks. For example, where stateful address assignment is used, it\n",
" > is possible for a DHCP server to securely notify the AR (DHCP Relay\n",
" > Agent potentially) of the IP address assigned to the MN. The AR may\n",
" > note the IP address to MN ID mapping at that time. Also, when IPv6CP\n",
"What's IPv6CP, first use, please expand\n",
" > is used, it is possible for the AR to know the same mapping.\n",
"\n",
" > When stateless autoconfiguration is used by the MN to obtain a CoA,\n",
" > SeND may be used to protect against the threats of ND in general. It\n",
"\n",
"Need a reference for SeND; expand ND\n",
" > must be noted that this protocol is not attempting to solve the\n",
" > general threats of ND itself. Some other mechanisms may also be\n",
" > available for IP address authorization. For instance, in cellular\n",
" > networks that do not have a broadcast link between the MN and a base\n",
" > station, the packets coming on the link between the MN and BS can be\n",
" > considered valid, since it is an authenticated point-to-point link to\n",
" > the MN. In such a case, SeND is not required to achieve IP address\n",
" > authorization, even for of stateless IP address autoconfiguration. .\n",
"\n",
"Not sure whether SEC considerations are complete given that the \n",
"threat model is not available.\n",
"\n",
"8. IANA Considerations\n",
"\n",
"<snip>\n",
"\n",
"\n",
"_______________________________________________\n",
"Mipshop mailing list\n",
"Mipshop@ietf.org\n",
"https://www1.ietf.org/mailman/listinfo/mipshop\n",
"---------------------------------\n",
"* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *\n",
"82attendees 2011-11-15 00:00:00\n",
"* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *\n",
"cnrp 2000-05-09 00:00:00\n",
"On Sun, May 07, 2000 at 04:48:53PM -0400, Leslie Daigle wrote:\n",
"> My own perspective is that:\n",
">\n",
"> . we have the service/server distinction currently, but\n",
"> . this is looking for an ability to acknowledge/express\n",
"> relationships between different services (administrative\n",
"> control) that cover the same space (dataset).\n",
">\n",
"> Answers for Roland? Comments?\n",
"\n",
"Hmm... Of what use would such an expression of relationships be\n",
"to the client? With DNS for example the concept of primary vs\n",
"secondary only has meaning to the servers and isn't representable\n",
"in NS records. Do we need that in the client here or is it\n",
"enough to have a service/server distinction?\n",
"\n",
"\n",
"> From: Roland Hedberg <roland@catalogix.ac.se>\n",
"> Subject: more CNRP\n",
"> To: leslie@THINKINGCAT.COM (Leslie Daigle)\n",
">\n",
"> Hi again,\n",
">\n",
"> Perhaps this should have been sent to the list instead of to you.\n",
"> You can forwarded it if you want.\n",
">\n",
"> My problem is as follows:\n",
">\n",
"> If a CNRP server wants to return as a result for a query a set of sets\n",
"> of referrals.\n",
">\n",
"> A example taken from the a LDAP environment ( but don't let that bother\n",
"> you, read CNRP everywhere it says LDAP :-) ).\n",
">\n",
"> As a response to a LDAP query a server can return several 'groups' of\n",
"> referrals. Where the underlying assumption is that all referrals within\n",
"> one group are references to the same data ( masters and slaves ).\n",
">\n",
"> How would that be represented in a CNRP response ?\n",
"\n",
"By returning a referral that references a Service object that had\n",
"multiple Servers listed.\n",
">\n",
"> If I understand the CNRP DTD correctly, which I am not totally convinced of,\n",
"> then the definition of 'service' doesn't allow you to have several\n",
"> serviceURI's.\n",
"> Which I take to mean that you can not handle the kind of situation that I have\n",
"> described above.\n",
"\n",
"ServiceURI isn't really the URI of the actual server. ServiceURI is\n",
"an identifier only in that it allows you to differntiate one service (dataset)\n",
"from another. In the Service object there is only one ServiceURI but\n",
"multiple Server objects listed under the Servers tag.\n",
"\n",
"The question becomes: is that sufficient for your needs? Are you\n",
"going to something more robust than master/slave that needs to be\n",
"expressed to the client?\n",
"\n",
"-MM\n",
"\n",
"--\n",
"--------------------------------------------------------------------------------\n",
"Michael Mealling | Vote Libertarian! | www.rwhois.net/michael\n",
"Sr. Research Engineer | www.ga.lp.org/gwinnett | ICQ#: 14198821\n",
"Network Solutions | www.lp.org | michaelm@netsol.com\n",
"---------------------------------\n",
"> On Sun, May 07, 2000 at 04:48:53PM -0400, Leslie Daigle wrote:\n",
"> > My own perspective is that:\n",
"> >\n",
"> > . we have the service/server distinction currently, but\n",
"> > . this is looking for an ability to acknowledge/express\n",
"> > relationships between different services (administrative\n",
"> > control) that cover the same space (dataset).\n",
"> >\n",
"> > Answers for Roland? Comments?\n",
">\n",
"> Hmm... Of what use would such an expression of relationships be\n",
"> to the client? With DNS for example the concept of primary vs\n",
"> secondary only has meaning to the servers and isn't representable\n",
"> in NS records. Do we need that in the client here or is it\n",
"> enough to have a service/server distinction?\n",
"\n",
"It must be to the benefit of the client if it is told that a group of\n",
"servers all point to the same information, such that the client can choose\n",
"anyone of them when it wants to follow a referral.\n",
"And if the first choice is not reachable another from the group can be used.\n",
"\n",
"\n",
"> > How would that be represented in a CNRP response ?\n",
">\n",
"> By returning a referral that references a Service object that had\n",
"> multiple Servers listed.\n",
"\n",
"Ah, your correct.\n",
"So the following construct would be used:\n",
"\n",
"<service id=\"1\" >\n",
" <serviceURI=\"1.2.752.17.5.1\">\n",
" <servers>\n",
" <server>\n",
" <serverURI>ldap://ldap.umu.se/dc=umu,dc=se</serverURI>\n",
" </server>\n",
" <server>\n",
" <serverURI>ldap://ldap.sunet.se/dc=umu,dc=se</serverURI>\n",
" </server>\n",
" </servers>\n",
"</service>\n",
"\n",
"> ServiceURI isn't really the URI of the actual server. ServiceURI is\n",
"> an identifier only in that it allows you to differntiate one service (dataset)\n",
"> from another. In the Service object there is only one ServiceURI but\n",
"> multiple Server objects listed under the Servers tag.\n",
"\n",
"Now, I'm lost. If the serviceURI is just a identifier and the collection\n",
"of characters has no intrinsic meaning. Then a construct\n",
"like this (extract from example 6.3 in cnrp-02.txt) must be rather useless:\n",
"\n",
"<results>\n",
" <service id=\"2\">\n",
" <serviceURI>\"http://servers.acmecorp.co.uk</serviceURI>\n",
" </service>\n",
" <resource>\n",
" <referral>\n",
" <serviceRef id=\"2\">\n",
" </referral>\n",
" </resource>\n",
"</results>\n",
"\n",
"According to the text this piece should be a referral. But if it is\n",
"as you write, that the serviceURI is just a identifier and should not\n",
"be interpreted as a URI pointing to some information, then where is\n",
"the real referral.\n",
"\n",
"Shouldn't it at least have been:\n",
"\n",
"<results>\n",
" <service id=\"2\">\n",
" <serviceURI>\"http://servers.acmecorp.co.uk</serviceURI>\n",
" <server>\n",
" <serverURI>\"http://servers.acmecorp.co.uk/</serverURI>\n",
" </server>\n",
" </service>\n",
" <resource>\n",
" <referral>\n",
" <serviceRef id=\"2\">\n",
" </referral>\n",
" </resource>\n",
"</results>\n",
"\n",
"in order to be useful.\n",
"\n",
"> The question becomes: is that sufficient for your needs? Are you\n",
"> going to something more robust than master/slave that needs to be\n",
"> expressed to the client?\n",
"\n",
"Well, it could be useful to be able to order the servers like in SRV records.\n",
"Perhaps 'property' could be used since it is already in the DTD.\n",
"\n",
"<service id=\"1\" >\n",
" <serviceURI=\"1.2.752.17.5.1\">\n",
" <servers>\n",
" <server>\n",
" <serverURI>ldap://ldap.umu.se/dc=umu,dc=se</serverURI>\n",
" <property name=\"priority\">10</property>\n",
" </server>\n",
" <server>\n",
" <serverURI>ldap://ldap.sunet.se/dc=umu,dc=se</serverURI>\n",
" <property name=\"priority\">100</property>\n",
" </server>\n",
" </servers>\n",
"</service>\n",
"\n",
"\n",
"-- Roland\n",
"------------------------------------------------\n",
"Roland Hedberg phone : +47 23 08 29 96\n",
"Dalsveien 53 mobile(NO): +47 90 66 44 52\n",
"No-0775 Oslo mobile(SE): +46 70 520 420 3\n",
"Norway\n",
"---------------------------------\n",
"On Tue, May 09, 2000 at 03:04:23PM +0200, Roland Hedberg wrote:\n",
"> > On Sun, May 07, 2000 at 04:48:53PM -0400, Leslie Daigle wrote:\n",
"> > > My own perspective is that:\n",
"> > >\n",
"> > > . we have the service/server distinction currently, but\n",
"> > > . this is looking for an ability to acknowledge/express\n",
"> > > relationships between different services (administrative\n",
"> > > control) that cover the same space (dataset).\n",
"> > >\n",
"> > > Answers for Roland? Comments?\n",
"> >\n",
"> > Hmm... Of what use would such an expression of relationships be\n",
"> > to the client? With DNS for example the concept of primary vs\n",
"> > secondary only has meaning to the servers and isn't representable\n",
"> > in NS records. Do we need that in the client here or is it\n",
"> > enough to have a service/server distinction?\n",
">\n",
"> It must be to the benefit of the client if it is told that a group of\n",
"> servers all point to the same information, such that the client can choose\n",
"> anyone of them when it wants to follow a referral.\n",
"> And if the first choice is not reachable another from the group can be used.\n",
"\n",
"Oh yes. That is useful. But what is not communicated to the client\n",
"is which one is the master and which one is hte slave. All the client\n",
"sees is multiple NS records with no ordering or any other semantics.\n",
"Let me clarify the question: of what use is it for the client to know\n",
"anything more than a list of servers all serving the same dataset equally?\n",
"Do you need to prioritize them (ala MX records)? Do you need to know\n",
"the master/slave relationship at the client level?\n",
"\n",
"\n",
"> > > How would that be represented in a CNRP response ?\n",
"> >\n",
"> > By returning a referral that references a Service object that had\n",
"> > multiple Servers listed.\n",
">\n",
"> Ah, your correct.\n",
"> So the following construct would be used:\n",
">\n",
"> <service id=\"1\" >\n",
"> <serviceURI=\"1.2.752.17.5.1\">\n",
"> <servers>\n",
"> <server>\n",
"> <serverURI>ldap://ldap.umu.se/dc=umu,dc=se</serverURI>\n",
"> </server>\n",
"> <server>\n",
"> <serverURI>ldap://ldap.sunet.se/dc=umu,dc=se</serverURI>\n",
"> </server>\n",
"> </servers>\n",
"> </service>\n",
"\n",
"Yep...\n",
"\n",
"> > ServiceURI isn't really the URI of the actual server. ServiceURI is\n",
"> > an identifier only in that it allows you to differntiate one service (dataset)\n",
"> > from another. In the Service object there is only one ServiceURI but\n",
"> > multiple Server objects listed under the Servers tag.\n",
">\n",
"> Now, I'm lost. If the serviceURI is just a identifier and the collection\n",
"> of characters has no intrinsic meaning. Then a construct\n",
"> like this (extract from example 6.3 in cnrp-02.txt) must be rather useless:\n",
">\n",
"> <results>\n",
"> <service id=\"2\">\n",
"> <serviceURI>\"http://servers.acmecorp.co.uk</serviceURI>\n",
"> </service>\n",
"> <resource>\n",
"> <referral>\n",
"> <serviceRef id=\"2\">\n",
"> </referral>\n",
"> </resource>\n",
"> </results>\n",
">\n",
"> According to the text this piece should be a referral. But if it is\n",
"> as you write, that the serviceURI is just a identifier and should not\n",
"> be interpreted as a URI pointing to some information, then where is\n",
"> the real referral.\n",
"\n",
"In this case yes, it has no meaning. But there are others where it may\n",
"since the ServiceURI uniquely identifies the 'dataset' being talked about.\n",
"In some cases the involved parties may already know who the servers are\n",
"but they just need to know they're talking about the same service.\n",
"Its difficult to express this kind of semantic in the DTD....\n",
"\n",
">\n",
"> Shouldn't it at least have been:\n",
">\n",
"> <results>\n",
"> <service id=\"2\">\n",
"> <serviceURI>\"http://servers.acmecorp.co.uk</serviceURI>\n",
"> <server>\n",
"> <serverURI>\"http://servers.acmecorp.co.uk/</serverURI>\n",
"> </server>\n",
"> </service>\n",
"> <resource>\n",
"> <referral>\n",
"> <serviceRef id=\"2\">\n",
"> </referral>\n",
"> </resource>\n",
"> </results>\n",
">\n",
"> in order to be useful.\n",
"\n",
"For a referral, yes. But for other places where the Service object\n",
"is used, no....\n",
"\n",
"> > The question becomes: is that sufficient for your needs? Are you\n",
"> > going to something more robust than master/slave that needs to be\n",
"> > expressed to the client?\n",
">\n",
"> Well, it could be useful to be able to order the servers like in SRV records.\n",
"> Perhaps 'property' could be used since it is already in the DTD.\n",
"\n",
"Yep. This is the #1 reason why the Server tag is allowed to hold properties.\n",
"Neat, huh? You could also create other properties used here such as\n",
"saying things like \"this server is for the pacific rim and this server is\n",
"for Europe\" (although how you say that reliably is beyond me at this point).\n",
"\n",
"> <service id=\"1\" >\n",
"> <serviceURI=\"1.2.752.17.5.1\">\n",
"> <servers>\n",
"> <server>\n",
"> <serverURI>ldap://ldap.umu.se/dc=umu,dc=se</serverURI>\n",
"> <property name=\"priority\">10</property>\n",
"> </server>\n",
"> <server>\n",
"> <serverURI>ldap://ldap.sunet.se/dc=umu,dc=se</serverURI>\n",
"> <property name=\"priority\">100</property>\n",
"> </server>\n",
"> </servers>\n",
"> </service>\n",
"\n",
"Ywp..\n",
"\n",
"-MM\n",
"\n",
"--\n",
"--------------------------------------------------------------------------------\n",
"Michael Mealling | Vote Libertarian! | www.rwhois.net/michael\n",
"Sr. Research Engineer | www.ga.lp.org/gwinnett | ICQ#: 14198821\n",
"Network Solutions | www.lp.org | michaelm@netsol.com\n",
"---------------------------------\n",
"* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *\n",
"lsd 2010-07-09 00:00:00\n",
"A new IETF working group has been formed in the Real-time Applications and\n",
"Infrastructure Area. For additional information, please contact the Area\n",
"Directors or the WG Chairs.\n",
"\n",
"Loosely-coupled SIP Devices (LSD)\n",
"---------------------------------------------------\n",
"Current Status: Active Working Group\n",
"\n",
"Chairs:\n",
" Simon Pietro Romano <spromano@unina.it>\n",
"\n",
"Real-time Applications and Infrastructure Area Directors:\n",
" Gonzalo Camarillo <Gonzalo.Camarillo@ericsson.com>\n",
" Robert Sparks <rjsparks@nostrum.com>\n",
"\n",
"Real-time Applications and Infrastructure Area Advisor:\n",
" Gonzalo Camarillo <Gonzalo.Camarillo@ericsson.com>\n",
"\n",
"Mailing Lists: General Discussion: lsd@ietf.org\n",
" To Subscribe: https://www.ietf.org/mailman/listinfo/lsd\n",
" Archive: http://www.ietf.org/mail-archive/web/lsd/current/maillist.html\n",
"\n",
"Description of Working Group:\n",
"\n",
"Disaggregated media refers to the ability for a user to create a\n",
"multimedia session combining different media streams coming from\n",
"different devices under his or her control so that they are treated by\n",
"the far end of the session as a single media session. \n",
"\n",
"Generally, a given participant uses a single device to establish (or\n",
"participate in) a given multimedia session. Consequently, the SIP\n",
"signaling to manage the multimedia session and the actual media\n",
"streams are typically co-located in the same device. In scenarios\n",
"involving disaggregated media, a user wants to establish a single\n",
"multimedia session combining different media streams coming from\n",
"different devices under his or her control. This creates a need to\n",
"coordinate the exchange of the those media streams within the\n",
"multimedia session.\n",
"\n",
"There are a number of existing mechanisms that can be used to\n",
"coordinate different devices under user's control and to involve them\n",
"in the call (e.g. Message Bus (Mbus) [RFC3259], Megaco [ITU-T H.248.1]\n",
"and SIP 3pcc [RFC3725]). However, these mechanisms are intended to be\n",
"used in \"tightly coupled\" scenarios. The use of all those mechanisms\n",
"requires the presence of a \"master\" device. That is, at least one\n",
"among the different devices under the control of the same user must\n",
"support the control mechanism and be able to become a controller for\n",
"the other devices in the call. Moreover, the \"master\" device is\n",
"supposed to remain involved in the user's session for its entire\n",
"duration given that performing a handover of the master role is\n",
"typically cumbersome and sometimes impossible.\n",
"\n",
"The objective of this working group is to develop a framework for\n",
"disaggregated media in \"loosely-coupled\" scenarios, where no single\n",
"device needs to remain in the session for its entire duration and no\n",
"single device needs to act as a master. Coordination among devices in\n",
"this type of scenario is less tight than in the scenarios described\n",
"above since they do not assume central elements with complete\n",
"knowledge of the whole media session. While the framework may describe\n",
"how to use existing mechanisms (e.g., the SIP REFER method) to\n",
"coordinate devices, the working group will not develop new device\n",
"coordination mechanisms. The framework may identify the need for new\n",
"(non-device-coordination) mechanisms to enable the implementation of\n",
"loosely-coupled scenarios. In case the need for such new mechanisms is\n",
"identified, the working group will specify them.\n",
"\n",
"Specifically, the proposed working group will develop the following\n",
"deliverables:\n",
"\n",
"1. A framework document describing key considerations for the exchange\n",
" of disaggregated media in SIP. The document will include use cases\n",
" and examples. The document may indentify the need for new\n",
" mechanisms or extensions to existing mechanisms.\n",
"\n",
"2. Specifications of new mechanisms or extensions to existing\n",
" mechanisms if the need is identified in the framework.\n",
"\n",
"Goals and Milestones:\n",
"\n",
"Feb 2011 - Framework document sent to the IESG (Informational)\n",
"---------------------------------\n",
"* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *\n",
"78attendees 2010-07-24 00:00:00\n",
"* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *\n",
"yot 2017-07-20 00:00:00\n",
"Dear all,\n",
"\n",
"This is a reminder for the YANG of THINGS Side meeting, which will take =\n",
"place *TODAY* (July 20th), 10AM-12PM CEST in room room Tyrolka, =\n",
"Mezzanine Level.\n",
"\n",
"The materials for the meeting are available online at the following =\n",
"address: https://github.com/Ixau/yot/tree/master/ietf99\n",
"\n",
"Remote presence will be made available thanks to Jitsi - just go to the =\n",
"address through your browser: https://jitsi.tools.ietf.org/yot=20\n",
"We will also try to take notes in the Etherpad: =\n",
"http://etherpad.tools.ietf.org:9000/p/yot (This is of course an =\n",
"informal meeting, so don=E2=80=99t expect detailed minutes)\n",
"\n",
"Best,\n",
"Alexander\n",
"---------------------------------\n",
"Dear all,\n",
"\n",
"I=E2=80=99d like to thank everyone who participated today - in person or =\n",
"remotely via Jitsi. The room was full and we=E2=80=99ve had a lot of =\n",
"interesting discussions on the place of the different YANG-related =\n",
"technologies in the constrained space. One of the take-aways is to =\n",
"specify the classes of =C2=AB Things =C2=BB which may be of interest - =\n",
"potentially adding classes which would not fall in the typical =\n",
"constrained space (but which are embedded systems). (And we have at =\n",
"least 3 IoT-centric YANG modules.. which we could probably discuss via =\n",
"this ML.)\n",
"\n",
"You can find the minutes at the following address: =\n",
"https://github.com/Ixau/yot/blob/master/ietf99/minutes\n",
"\n",
"Don=E2=80=99t hesitate to add new elements with your notes (via a pull =\n",
"request or by mail).\n",
"\n",
"Best,\n",
"Alexander\n",
"---------------------------------\n",
"* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *\n",
"sframe 2020-11-20 00:00:00\n",
"Hi all,\n",
"\n",
"One point that was raised during the SFrame meeting at IETF 109 was whether\n",
"the MLS-SFrame integration should re-use the MLS \"secret tree\" structure.\n",
"\n",
"For those who might not be deep in MLS, the secret tree provides forward\n",
"secrecy for MLS messages sent within an epoch, without the need to generate\n",
"base keys for all participants [1]. In other words, if you have a group\n",
"where (a) only a few participants are speaking and (b) they ratchet their\n",
"key after each message, then the secret tree assures that compromise of any\n",
"group member with current state won't leak the keys for messages that have\n",
"already been sent/received.\n",
"\n",
"Let's consider three possible integrations:\n",
"\n",
"1. SFrame uses the same secret tree as MLS (export at the leaves)\n",
"2. SFrame exports a single secret, then makes its own secret tree\n",
"3. SFrame exports a single secret, then uses a simpler scheme (like the\n",
"current one)\n",
"\n",
"I would posit that (1) is not workable. That requires a tight coupling\n",
"between the MLS and SFrame stacks, which will often not be tractable, e.g.,\n",
"in situations where the media logic is in a separate thread or process from\n",
"the MLS logic.\n",
"\n",
"(2) only adds value over (3) if SFrame senders ratchet their keys.\n",
"Otherwise, there's no forward secrecy boundary; criterion (b) above doesn't\n",
"apply. The current MLS-SFrame document has no provision for ratcheting\n",
"within an epoch. We could do it, but it would require more bits of header\n",
"to send a \"generation\" as MLS does, to indicate how many times you've\n",
"ratcheted. It also seems like situations where you have mostly silent\n",
"participants are more rare in real-time cases than in messaging cases.\n",
"\n",
"So my preference would still be for (3), largely because intra-epoch\n",
"forward secrecy seems like a pretty secondary consideration here. If\n",
"intra-epoch forward secrecy is a problem people want to solve, then we\n",
"should do ratcheting, and we should do the secret tree.\n",
"\n",
"--Richard\n",
"\n",
"[1]\n",
"https://github.com/mlswg/mls-protocol/blob/master/draft-ietf-mls-protocol.md#secret-tree-secret-tree\n",
"---------------------------------\n",
"Thanks for the detailed write-up!\n",
"\n",
"I agree that option 1 is very problematic for a number of reasons and I don’t see a good reason to pursue it.\n",
"I think you hit the nail on the head when you asked whether we want into-epoch forward secrecy or not.\n",
"\n",
"In uses cases where call participants comes together ad-hoc as an ephemeral group, there is certainly little need for it. In other scenarios, where a fixed set of participants have reoccurring calls it might well be that there won’t be an epoch change for a considerable time (because no one joins or leaves the group). For these use cases option 2 & 3 come into play.\n",
"Option 2 solves the problem, but the solution comes at the price of implementation complexity.\n",
"Option 3 could be much simpler, namely it would be enough to use a unique session ID to derive a base key that is unique to the session:\n",
"\n",
"sframe_epoch_secret = MLS-Exporter(\"SFrame 10 MLS\", \"\", AEAD.Nk)\n",
"session_secret = HKDF-Expand(sframe_epoch_secret, session_id, AEAD.Nk)\n",
"sender_base_key[index] = HKDF-Expand(session_secret, encode_big_endian(index, 8), AEAD.Nk)\n",
"\n",
"This is easy to implement and efficient but a little more error-prone since forward secrecy will only be achieved if session_id is unique between sessions. Re-using the same value will not yield forward secrecy.\n",
"\n",
"Raphael\n",
"\n",
"> On 20 Nov 2020, at 16:17, Richard Barnes <rlb@ipv.sx> wrote:\n",
"> \n",
"> Hi all,\n",
"> \n",
"> One point that was raised during the SFrame meeting at IETF 109 was whether the MLS-SFrame integration should re-use the MLS \"secret tree\" structure. \n",
"> \n",
"> For those who might not be deep in MLS, the secret tree provides forward secrecy for MLS messages sent within an epoch, without the need to generate base keys for all participants [1]. In other words, if you have a group where (a) only a few participants are speaking and (b) they ratchet their key after each message, then the secret tree assures that compromise of any group member with current state won't leak the keys for messages that have already been sent/received.\n",
"> \n",
"> Let's consider three possible integrations:\n",
"> \n",
"> 1. SFrame uses the same secret tree as MLS (export at the leaves)\n",
"> 2. SFrame exports a single secret, then makes its own secret tree\n",
"> 3. SFrame exports a single secret, then uses a simpler scheme (like the current one)\n",
"> \n",
"> I would posit that (1) is not workable. That requires a tight coupling between the MLS and SFrame stacks, which will often not be tractable, e.g., in situations where the media logic is in a separate thread or process from the MLS logic.\n",
"> \n",
"> (2) only adds value over (3) if SFrame senders ratchet their keys. Otherwise, there's no forward secrecy boundary; criterion (b) above doesn't apply. The current MLS-SFrame document has no provision for ratcheting within an epoch. We could do it, but it would require more bits of header to send a \"generation\" as MLS does, to indicate how many times you've ratcheted. It also seems like situations where you have mostly silent participants are more rare in real-time cases than in messaging cases.\n",
"> \n",
"> So my preference would still be for (3), largely because intra-epoch forward secrecy seems like a pretty secondary consideration here. If intra-epoch forward secrecy is a problem people want to solve, then we should do ratcheting, and we should do the secret tree.\n",
"> \n",
"> --Richard\n",
"> \n",
"> [1] https://github.com/mlswg/mls-protocol/blob/master/draft-ietf-mls-protocol.md#secret-tree-secret-tree <https://github.com/mlswg/mls-protocol/blob/master/draft-ietf-mls-protocol.md#secret-tree-secret-tree>-- \n",
"> Sframe mailing list\n",
"> Sframe@ietf.org\n",
"> https://www.ietf.org/mailman/listinfo/sframe\n",
"---------------------------------\n",
"* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *\n",
"6lowapp 2009-11-17 00:00:00\n",
"Hello All,\n",
"\n",
"I only just joined 6lowapp, so I caught up by reading the archives a bit. I didn't see any previous discussion of commissioning, besides a very quick security discussion. I apologize if this is thus already discussed somewhere I missed! I was working on the commissioning problem on the IPSO side, and was just directed to 6lowapp. \n",
"\n",
"I had some questions on the fundamental assumptions of commissioning:\n",
"\n",
"> 1) Duckling mode. One Device is put in a \"mother duck\" mode where it\n",
"> listens for broadcasts of Devices trying to imprint.\n",
"\n",
"I'm assuming this is based on http://www.cl.cam.ac.uk/~fms27/duckling/ (The Resurrecting Duckling). \n",
"\n",
"As described, the method isn’t highly secure. The main problems are:\n",
"\n",
"1)\tIf you just add whoever is in ‘imprint’ mode, it is easy for an attacker to add themselves to your network.\n",
"\n",
"2)\tHow to put nodes in duckling mode? The paper mentions maybe having a ‘master key’ who can do this.\n",
"\n",
"Wi-Fi Protected Setup (WPS) helps get around some of these problems by having a push-button and LED. The button puts the node into either mother or duckling mode (depending on node type), and an LED lets the user see which two nodes are going to connect. \n",
"\n",
">2) Out of Band Key mode. Each Device in manufactured with an initial\n",
"> symmetric key physically printed on the side in numeric and barcode\n",
"> form...\n",
"\n",
"This is similar to WPS’s PIN-based entry. The issue with that was many nodes which could run 6LoWPAN will physically be too small to have labels legibly printed on them. Further having labels which must agree with a preprogrammed key would increase manufacturing cost, and there is a problem with labels being knocked off. WPS gets around this by having nodes being able to generate their own PIN and displaying it on an LCD, but most 6LOWPAN nodes would be too cheap for this.\n",
"\n",
"There is a security threat too: let’s say you have a ‘secure’ network. An attacker has physical access to your end nodes, but not to the central station which authorizes new nodes to come online. For many application this would be a reasonable assumption I would think. They could just put a label with a different key on your end node, then drain the batteries from the node or do something else that would force a maintenance operation. When the node is being ‘fixed’ it will be put back on the network, but really you’ve just opened up your network to an attacker.\n",
"\n",
"> 3) Certificate mode. Each Device is manufactured with a certificate\n",
" > with an asymmetric key. The fingerprint of the certificate is printed\n",
"\n",
"Similar issues to labels above, but since it’s optional it could be deployed where manufactures specifically see it as an advantage. The issue with the first two is they are mandatory.\n",
"\n",
"> no public key cryptography to support\n",
"> modes 1 and 2 and additional crypto profile with public key\n",
"> cryptography for support of mode 3\n",
"\n",
"I also see that only mode 3 will use Public-Private exchange. Should method 1 not require a public-private exchange, as I know of no other way to secure a hostile channel? You can use ECC encryption which is pretty light-weight (less than 3K ROM, 300 bytes SRAM).\n",
"\n",
"If a node is too small to run ECC it has no business doing an In-Band key exchange. Passive listening on RF networks is far too easy and impossible to detect. Man In The Middle and other attacks at least need the attacker to do something detectable mostly. It’s not just an issue of someone by chance listening when you do the setup. The problem is they could easily purposely force one node off the network by selective jamming, and waiting until maintenance personal reset or replace the node and do a new key exchange. \n",
"\n",
"Was any consideration given to an Out Of Band (OOB) exchange method? Something like using IR (either IrDA or discrete IR), or even a very simple physical connection. It would almost guarantee a secure key exchange, have little code overhead, and physical connection can be used to power parasitically-powered nodes which might not be able to go through lengthy configuration processes. As well it would lend itself to a small ‘widget’ being used to authorize nodes, such as a cheap pen-sized tool with some smarts in it. \n",
"\n",
"Regards,\n",
"\n",
" -Colin O’Flynn\n",
"\n",
"PS: I have some information I wrote down for IPSO about Security problems & Commissioning problems I could try to post somewhere 6LOWAPP would have access to them as well.\n",
"---------------------------------\n",
"* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *\n",
"80attendees 2011-03-23 00:00:00\n",
"rpelletier@isoc.org:\n",
"> While we're talking about T-Shirts, don't forget to upload a picture\n",
"> to <http://www.ietf.org/ietf25/gallery.py>\n",
"\n",
"> Oldest one so far is IETF 32\n",
"\n",
"Arghh! Claudio, you beat me to the IETF 28 one! ;-)\n",
"\n",
"... but the caption under the picture is incorrect. IETF 28 was in the\n",
"autumn of 1993, not 1992. ;-)\n",
"\n",
"I guess my only remaining trump is the limited edition official terminal\n",
"room staff T-shirt from IETF 33 (July 1995). I'll upload soon. ;-)\n",
"\n",
"... but the person you really need to talk to is Bill Manning,\n",
"authorized T-shirt master collector. ;-)\n",
"\n",
"\t\t\t\tCheers,\n",
"\t\t\t\t /Liman\n",
"#----------------------------------------------------------------------\n",
"# Lars-Johan Liman, M.Sc. ! E-mail/SIP/Jabber: liman@autonomica.se\n",
"# Senior Systems Specialist ! Tel: +46 8 - 562 860 12\n",
"# Autonomica AB, Stockholm ! http://www.autonomica.se/\n",
"#----------------------------------------------------------------------\n",
"---------------------------------\n",
"* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *\n",
"earlywarning 1999-11-10 00:00:00\n",
"* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *\n",
"107attendees 1999-11-10 00:00:00\n",
"* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *\n",
"91all 1999-11-10 00:00:00\n",
"* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *\n",
"pidloc 1999-11-10 00:00:00\n",
"* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *\n",
"iola-wgcharter-tool 1999-11-10 00:00:00\n",
"* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *\n",
"oam 1999-11-10 00:00:00\n",
"* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *\n",
"6band 1999-11-10 00:00:00\n",
"* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *\n",
"103all 1999-11-10 00:00:00\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"/var/folders/_l/m1t5fd6s2yn3409p5pl_8q7c0000gn/T/ipykernel_6695/647057365.py:5: FutureWarning: Comparison of Timestamp with datetime.date is deprecated in order to match the standard library behavior. In a future version these will be considered non-comparable. Use 'ts == pd.Timestamp(date)' or 'ts.date() == date' instead.\n",
" matching_dates = archives[archive_names.index(root_name)].data.Date.dt.date == max_date\n"
]
}
],
"source": [
"# Read through some emails from those mailing lists\n",
"for c in top_master_cols:\n",
" max_date = plot_data[c][plot_data[c] == plot_data[c].max()].index[0]\n",
" root_name = c.split(\" \")[0]\n",
" matching_dates = archives[archive_names.index(root_name)].data.Date.dt.date == max_date\n",
" print(\"* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *\")\n",
" print(root_name, max_date)\n",
" for b in archives[archive_names.index(root_name)].data[matching_dates].Body:\n",
" if \"master\" in b:\n",
" print(b)\n",
" print(\"---------------------------------\")"
]
},
{
"cell_type": "markdown",
"id": "b1d2203a",
"metadata": {},
"source": [
"## Look closer at 2018-2020"
]
},
{
"cell_type": "code",
"execution_count": 55,
"id": "acc94f43",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>91all whitelist</th>\n",
" <th>91all blacklist</th>\n",
" <th>91all slave</th>\n",
" <th>91all master</th>\n",
" <th>sframe whitelist</th>\n",
" <th>sframe blacklist</th>\n",
" <th>sframe slave</th>\n",
" <th>sframe master</th>\n",
" <th>babel whitelist</th>\n",
" <th>babel blacklist</th>\n",
" <th>...</th>\n",
" <th>82attendees master_github</th>\n",
" <th>secdir master_github</th>\n",
" <th>mipshop master_github</th>\n",
" <th>80attendees master_github</th>\n",
" <th>cnrp master_github</th>\n",
" <th>6lowapp master_github</th>\n",
" <th>78attendees master_github</th>\n",
" <th>babel safetyslave</th>\n",
" <th>secdir safetyslave</th>\n",
" <th>cnrp safetyslave</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2018-10-28</th>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2018-10-29</th>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2018-10-30</th>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2018-10-31</th>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2018-11-01</th>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-12-27</th>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-12-28</th>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-12-29</th>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-12-30</th>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-12-31</th>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>796 rows × 90 columns</p>\n",
"</div>"
],
"text/plain": [
" 91all whitelist 91all blacklist 91all slave 91all master \\\n",
"2018-10-28 0.0 0.0 0.0 0.0 \n",
"2018-10-29 0.0 0.0 0.0 0.0 \n",
"2018-10-30 0.0 0.0 0.0 0.0 \n",
"2018-10-31 0.0 0.0 0.0 0.0 \n",
"2018-11-01 0.0 0.0 0.0 0.0 \n",
"... ... ... ... ... \n",
"2020-12-27 0.0 0.0 0.0 0.0 \n",
"2020-12-28 0.0 0.0 0.0 0.0 \n",
"2020-12-29 0.0 0.0 0.0 0.0 \n",
"2020-12-30 0.0 0.0 0.0 0.0 \n",
"2020-12-31 0.0 0.0 0.0 0.0 \n",
"\n",
" sframe whitelist sframe blacklist sframe slave sframe master \\\n",
"2018-10-28 0.0 0.0 0.0 0.0 \n",
"2018-10-29 0.0 0.0 0.0 0.0 \n",
"2018-10-30 0.0 0.0 0.0 0.0 \n",
"2018-10-31 0.0 0.0 0.0 0.0 \n",
"2018-11-01 0.0 0.0 0.0 0.0 \n",
"... ... ... ... ... \n",
"2020-12-27 0.0 0.0 0.0 0.0 \n",
"2020-12-28 0.0 0.0 0.0 0.0 \n",
"2020-12-29 0.0 0.0 0.0 0.0 \n",
"2020-12-30 0.0 0.0 0.0 0.0 \n",
"2020-12-31 0.0 0.0 0.0 0.0 \n",
"\n",
" babel whitelist babel blacklist ... 82attendees master_github \\\n",
"2018-10-28 0.0 0.0 ... 0.0 \n",
"2018-10-29 0.0 0.0 ... 0.0 \n",
"2018-10-30 0.0 0.0 ... 0.0 \n",
"2018-10-31 0.0 0.0 ... 0.0 \n",
"2018-11-01 0.0 0.0 ... 0.0 \n",
"... ... ... ... ... \n",
"2020-12-27 0.0 0.0 ... 0.0 \n",
"2020-12-28 0.0 0.0 ... 0.0 \n",
"2020-12-29 0.0 0.0 ... 0.0 \n",
"2020-12-30 0.0 0.0 ... 0.0 \n",
"2020-12-31 0.0 0.0 ... 0.0 \n",
"\n",
" secdir master_github mipshop master_github \\\n",
"2018-10-28 0.0 0.0 \n",
"2018-10-29 0.0 0.0 \n",
"2018-10-30 0.0 0.0 \n",
"2018-10-31 0.0 0.0 \n",
"2018-11-01 0.0 0.0 \n",
"... ... ... \n",
"2020-12-27 0.0 0.0 \n",
"2020-12-28 0.0 0.0 \n",
"2020-12-29 0.0 0.0 \n",
"2020-12-30 0.0 0.0 \n",
"2020-12-31 0.0 0.0 \n",
"\n",
" 80attendees master_github cnrp master_github \\\n",
"2018-10-28 0.0 0.0 \n",
"2018-10-29 0.0 0.0 \n",
"2018-10-30 0.0 0.0 \n",
"2018-10-31 0.0 0.0 \n",
"2018-11-01 0.0 0.0 \n",
"... ... ... \n",
"2020-12-27 0.0 0.0 \n",
"2020-12-28 0.0 0.0 \n",
"2020-12-29 0.0 0.0 \n",
"2020-12-30 0.0 0.0 \n",
"2020-12-31 0.0 0.0 \n",
"\n",
" 6lowapp master_github 78attendees master_github \\\n",
"2018-10-28 0.0 0.0 \n",
"2018-10-29 0.0 0.0 \n",
"2018-10-30 0.0 0.0 \n",
"2018-10-31 0.0 0.0 \n",
"2018-11-01 0.0 0.0 \n",
"... ... ... \n",
"2020-12-27 0.0 0.0 \n",
"2020-12-28 0.0 0.0 \n",
"2020-12-29 0.0 0.0 \n",
"2020-12-30 0.0 0.0 \n",
"2020-12-31 0.0 0.0 \n",
"\n",
" babel safetyslave secdir safetyslave cnrp safetyslave \n",
"2018-10-28 0.0 0.0 0.0 \n",
"2018-10-29 0.0 0.0 0.0 \n",
"2018-10-30 0.0 0.0 0.0 \n",
"2018-10-31 0.0 0.0 0.0 \n",
"2018-11-01 0.0 0.0 0.0 \n",
"... ... ... ... \n",
"2020-12-27 0.0 0.0 0.0 \n",
"2020-12-28 0.0 0.0 0.0 \n",
"2020-12-29 0.0 0.0 0.0 \n",
"2020-12-30 0.0 0.0 0.0 \n",
"2020-12-31 0.0 0.0 0.0 \n",
"\n",
"[796 rows x 90 columns]"
]
},
"execution_count": 55,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data_2018_20 = plot_data[plot_data.index.date > datetime.date(2018, 10, 21)]\n",
"data_2018_20 = data_2018_20[data_2018_20.index.year <= 2020]\n",
"\n",
"window = 7 # rolling weekly average\n",
"data_2018_20 = data_2018_20.rolling(window).mean().dropna(how='all')\n",
"data_2018_20"
]
},
{
"cell_type": "code",
"execution_count": 56,
"id": "dd398e50",
"metadata": {},
"outputs": [
{
"data": {
"image/png": "",
"text/plain": [
"<Figure size 640x480 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"top_columns = data_2018_20.sum().sort_values(ascending=False)[:10].index\n",
"\n",
"fig = plt.figure()\n",
"ax = fig.add_subplot(1,1,1) \n",
"\n",
"for c in top_columns:\n",
" plt.plot(data_2018_20[c].index, data_2018_20[c], alpha=.4, label=c)\n",
"\n",
"plt.axvline(x=datetime.datetime.strptime(\"10-22-2018\", \"%m-%d-%Y\"), linestyle=':', color=\"grey\", alpha=.5)\n",
"\n",
"plt.xticks(rotation=45)\n",
"plt.title(\"Top mailing lists using exclusive language in 2018-2020\")\n",
"plt.ylabel(\"Weekly average\")\n",
"plt.legend()\n",
"ax.legend(bbox_to_anchor=(1.1, 1.05))\n",
"\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7da566a2",
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"# Read some of those emails\n",
"max_date = datetime.date(2018, 11, 28)\n",
"root_name = \"dots\"\n",
"matching_dates = archives[archive_names.index(root_name)].data.Date.dt.date == max_date\n",
"for b in archives[archive_names.index(root_name)].data[matching_dates].Body:\n",
" print(b)\n",
" print(\"* * * * * * * * * * * * * * * * * * * * * * * * * \")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.4"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": true,
"sideBar": true,
"skip_h1_title": false,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {
"height": "calc(100% - 180px)",
"left": "10px",
"top": "150px",
"width": "167.6px"
},
"toc_section_display": true,
"toc_window_display": true
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment