Last active
October 6, 2021 22:05
-
-
Save dwhswenson/50e435ed67c4b3cd6baee75a8ececdfb to your computer and use it in GitHub Desktop.
Hashing in Python
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"id": "ea69d422", | |
"metadata": {}, | |
"source": [ | |
"# Hashing in Python\n", | |
"\n", | |
"This is a little exploration of hashing vs. equality in Python, as a way to play with some ideas that might be relevant to what we do in OpenPathSampling 2.0.\n", | |
"\n", | |
"For a good overview, see https://hynek.me/articles/hashes-and-equality/" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 1, | |
"id": "a94e0e49", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"class EqualityWithoutMutable:\n", | |
" def __init__(self, immut, mut):\n", | |
" self.immut = immut\n", | |
" self.mut = mut\n", | |
" \n", | |
" def __hash__(self):\n", | |
" return hash((self.__class__, self.immut))\n", | |
" \n", | |
" def __eq__(self, other):\n", | |
" return self.immut == other.immut\n", | |
" \n", | |
" def __repr__(self):\n", | |
" return f\"{self.__class__.__name__}('{self.immut}', {self.mut})\"\n", | |
"\n", | |
"\n", | |
"class EqualityWithMutable(EqualityWithoutMutable):\n", | |
" def __hash__(self):\n", | |
" return hash((self.__class__, self.immut))\n", | |
"\n", | |
" def __eq__(self, other):\n", | |
" return self.immut == other.immut and self.mut == other.mut" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "b18b4fa8", | |
"metadata": {}, | |
"source": [ | |
"Aside: When overriding `__eq__`, we must also redefine `__hash__`. From [the docs](https://docs.python.org/3/reference/datamodel.html#object.__hash__):\n", | |
"\n", | |
"> A class that overrides `__eq__()` and does not define `__hash__()` will have its `__hash__()` implicitly set to `None`.\n", | |
" \n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 2, | |
"id": "42bf9c24", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"w_a1 = EqualityWithMutable('a', 1)\n", | |
"w_a2 = EqualityWithMutable('a', 2)\n", | |
"\n", | |
"wo_a1 = EqualityWithoutMutable('a', 1)\n", | |
"wo_a2 = EqualityWithoutMutable('a', 2)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "c3ec4b2f", | |
"metadata": {}, | |
"source": [ | |
"First, there will be a hash collision between the a1 and a2 versions, since the hash is only calculated on the `'a'` input:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 3, | |
"id": "e760e372", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"assert hash(w_a1) == hash(w_a2)\n", | |
"assert hash(wo_a1) == hash(wo_a2)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "0b27a5e9", | |
"metadata": {}, | |
"source": [ | |
"However, there is no hash collision between the two classes, because we use the class in the hash calculation:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 4, | |
"id": "3609b4b1", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"assert hash(w_a1) != hash(wo_a1)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "cd9a3ddb", | |
"metadata": {}, | |
"source": [ | |
"## In dictionaries / sets\n", | |
"\n", | |
"If we check equality against the \"mutable\" quantity, then both objects go into any dictionary we create:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 5, | |
"id": "e596d204", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"{EqualityWithMutable('a', 1): 'A', EqualityWithMutable('a', 2): 'B'}" | |
] | |
}, | |
"execution_count": 5, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"{w_a1: \"A\", w_a2: \"B\"}" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "7acb4f0b", | |
"metadata": {}, | |
"source": [ | |
"However, if equality doesn't test against the \"mutable\" attributes, then some unexpected behavior can occur:\n", | |
"\n", | |
"1. The key is the first key to go into the dict\n", | |
"2. The value is that last value that matches that hash" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 6, | |
"id": "524f528e", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"{EqualityWithoutMutable('a', 1): 'B'}" | |
] | |
}, | |
"execution_count": 6, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"{wo_a1: \"A\", wo_a2: \"B\"}" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "cb5acb99", | |
"metadata": {}, | |
"source": [ | |
"## In lists\n", | |
"\n", | |
"Methods like `index` find the first object that matches `==`. So if we include the mutable aspects in `==`, we get the object we expect:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 7, | |
"id": "013a1814", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"1" | |
] | |
}, | |
"execution_count": 7, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"l = [w_a1, w_a2]\n", | |
"l.index(w_a2)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "f121b801", | |
"metadata": {}, | |
"source": [ | |
"If not, we get something that might not be expected:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 8, | |
"id": "4b9cebd5", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"0" | |
] | |
}, | |
"execution_count": 8, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"l = [wo_a1, wo_a2]\n", | |
"l.index(wo_a2)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "1d23ce2d", | |
"metadata": {}, | |
"source": [ | |
"## Changing the mutable part\n", | |
"\n", | |
"If we change the mutable part, we can still get this out of a dictionary." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 9, | |
"id": "458f60d1", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"{EqualityWithMutable('a', 1): 'foo'}" | |
] | |
}, | |
"execution_count": 9, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"w_a_mut = EqualityWithMutable('a', 1)\n", | |
"dct = {w_a_mut: 'foo'}\n", | |
"dct" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 10, | |
"id": "0ee01f81", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"w_a_mut.mut = 2" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 11, | |
"id": "d1a18a27", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"{EqualityWithMutable('a', 2): 'foo'}" | |
] | |
}, | |
"execution_count": 11, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"dct" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 12, | |
"id": "fa2b4445", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"'foo'" | |
] | |
}, | |
"execution_count": 12, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"dct[w_a_mut]" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "19675398", | |
"metadata": {}, | |
"source": [ | |
"## Changing the immutable part\n", | |
"\n", | |
"This is where we get into trouble!" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 13, | |
"id": "383968e2", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"{EqualityWithMutable('a', 1): 'bar'}" | |
] | |
}, | |
"execution_count": 13, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"w_immut_1 = EqualityWithMutable('a', 1)\n", | |
"dct = {w_immut_1: 'bar'}\n", | |
"dct" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 14, | |
"id": "47dfa595", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"w_immut_1.immut = 'b'" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 15, | |
"id": "45c94bda", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"{EqualityWithMutable('b', 1): 'bar'}\n" | |
] | |
} | |
], | |
"source": [ | |
"print(dct)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "aba53ab5", | |
"metadata": {}, | |
"source": [ | |
"It does see our object in the keys, but..." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 16, | |
"id": "3e31eb2d", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"ename": "KeyError", | |
"evalue": "EqualityWithMutable('b', 1)", | |
"output_type": "error", | |
"traceback": [ | |
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", | |
"\u001b[0;31mKeyError\u001b[0m Traceback (most recent call last)", | |
"\u001b[0;32m<ipython-input-16-f4c47ae2bdcf>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mdct\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mw_immut_1\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", | |
"\u001b[0;31mKeyError\u001b[0m: EqualityWithMutable('b', 1)" | |
] | |
} | |
], | |
"source": [ | |
"dct[w_immut_1]" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "4d6035a4", | |
"metadata": {}, | |
"source": [ | |
"What has happened here? Well, when we made `dct`, our object had certain hash value. That hash tells the `dict` the name of the bucket to store our result.\n", | |
"\n", | |
"When we changed the immutable part, we changed the value of the hash. When we asked the `dict` to look it up, it looked in a different bucket -- one that didn't exist.\n", | |
"\n", | |
"We were able to print `dct`, and the `repr` of the key is there and updated. The data is still there, but it can't be extracted by key. (However, we couldn't use IPython's default pretty-print -- we would have gotten the same `KeyError`!)" | |
] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 3", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.7.10" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 5 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment