Skip to content

Instantly share code, notes, and snippets.

@dwhswenson
Last active October 6, 2021 22:05
Show Gist options
  • Save dwhswenson/50e435ed67c4b3cd6baee75a8ececdfb to your computer and use it in GitHub Desktop.
Save dwhswenson/50e435ed67c4b3cd6baee75a8ececdfb to your computer and use it in GitHub Desktop.
Hashing in Python
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"id": "ea69d422",
"metadata": {},
"source": [
"# Hashing in Python\n",
"\n",
"This is a little exploration of hashing vs. equality in Python, as a way to play with some ideas that might be relevant to what we do in OpenPathSampling 2.0.\n",
"\n",
"For a good overview, see https://hynek.me/articles/hashes-and-equality/"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "a94e0e49",
"metadata": {},
"outputs": [],
"source": [
"class EqualityWithoutMutable:\n",
" def __init__(self, immut, mut):\n",
" self.immut = immut\n",
" self.mut = mut\n",
" \n",
" def __hash__(self):\n",
" return hash((self.__class__, self.immut))\n",
" \n",
" def __eq__(self, other):\n",
" return self.immut == other.immut\n",
" \n",
" def __repr__(self):\n",
" return f\"{self.__class__.__name__}('{self.immut}', {self.mut})\"\n",
"\n",
"\n",
"class EqualityWithMutable(EqualityWithoutMutable):\n",
" def __hash__(self):\n",
" return hash((self.__class__, self.immut))\n",
"\n",
" def __eq__(self, other):\n",
" return self.immut == other.immut and self.mut == other.mut"
]
},
{
"cell_type": "markdown",
"id": "b18b4fa8",
"metadata": {},
"source": [
"Aside: When overriding `__eq__`, we must also redefine `__hash__`. From [the docs](https://docs.python.org/3/reference/datamodel.html#object.__hash__):\n",
"\n",
"> A class that overrides `__eq__()` and does not define `__hash__()` will have its `__hash__()` implicitly set to `None`.\n",
" \n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "42bf9c24",
"metadata": {},
"outputs": [],
"source": [
"w_a1 = EqualityWithMutable('a', 1)\n",
"w_a2 = EqualityWithMutable('a', 2)\n",
"\n",
"wo_a1 = EqualityWithoutMutable('a', 1)\n",
"wo_a2 = EqualityWithoutMutable('a', 2)"
]
},
{
"cell_type": "markdown",
"id": "c3ec4b2f",
"metadata": {},
"source": [
"First, there will be a hash collision between the a1 and a2 versions, since the hash is only calculated on the `'a'` input:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "e760e372",
"metadata": {},
"outputs": [],
"source": [
"assert hash(w_a1) == hash(w_a2)\n",
"assert hash(wo_a1) == hash(wo_a2)"
]
},
{
"cell_type": "markdown",
"id": "0b27a5e9",
"metadata": {},
"source": [
"However, there is no hash collision between the two classes, because we use the class in the hash calculation:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "3609b4b1",
"metadata": {},
"outputs": [],
"source": [
"assert hash(w_a1) != hash(wo_a1)"
]
},
{
"cell_type": "markdown",
"id": "cd9a3ddb",
"metadata": {},
"source": [
"## In dictionaries / sets\n",
"\n",
"If we check equality against the \"mutable\" quantity, then both objects go into any dictionary we create:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "e596d204",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{EqualityWithMutable('a', 1): 'A', EqualityWithMutable('a', 2): 'B'}"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"{w_a1: \"A\", w_a2: \"B\"}"
]
},
{
"cell_type": "markdown",
"id": "7acb4f0b",
"metadata": {},
"source": [
"However, if equality doesn't test against the \"mutable\" attributes, then some unexpected behavior can occur:\n",
"\n",
"1. The key is the first key to go into the dict\n",
"2. The value is that last value that matches that hash"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "524f528e",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{EqualityWithoutMutable('a', 1): 'B'}"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"{wo_a1: \"A\", wo_a2: \"B\"}"
]
},
{
"cell_type": "markdown",
"id": "cb5acb99",
"metadata": {},
"source": [
"## In lists\n",
"\n",
"Methods like `index` find the first object that matches `==`. So if we include the mutable aspects in `==`, we get the object we expect:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "013a1814",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"l = [w_a1, w_a2]\n",
"l.index(w_a2)"
]
},
{
"cell_type": "markdown",
"id": "f121b801",
"metadata": {},
"source": [
"If not, we get something that might not be expected:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "4b9cebd5",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"l = [wo_a1, wo_a2]\n",
"l.index(wo_a2)"
]
},
{
"cell_type": "markdown",
"id": "1d23ce2d",
"metadata": {},
"source": [
"## Changing the mutable part\n",
"\n",
"If we change the mutable part, we can still get this out of a dictionary."
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "458f60d1",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{EqualityWithMutable('a', 1): 'foo'}"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"w_a_mut = EqualityWithMutable('a', 1)\n",
"dct = {w_a_mut: 'foo'}\n",
"dct"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "0ee01f81",
"metadata": {},
"outputs": [],
"source": [
"w_a_mut.mut = 2"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "d1a18a27",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{EqualityWithMutable('a', 2): 'foo'}"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dct"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "fa2b4445",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'foo'"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dct[w_a_mut]"
]
},
{
"cell_type": "markdown",
"id": "19675398",
"metadata": {},
"source": [
"## Changing the immutable part\n",
"\n",
"This is where we get into trouble!"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "383968e2",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{EqualityWithMutable('a', 1): 'bar'}"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"w_immut_1 = EqualityWithMutable('a', 1)\n",
"dct = {w_immut_1: 'bar'}\n",
"dct"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "47dfa595",
"metadata": {},
"outputs": [],
"source": [
"w_immut_1.immut = 'b'"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "45c94bda",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{EqualityWithMutable('b', 1): 'bar'}\n"
]
}
],
"source": [
"print(dct)"
]
},
{
"cell_type": "markdown",
"id": "aba53ab5",
"metadata": {},
"source": [
"It does see our object in the keys, but..."
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "3e31eb2d",
"metadata": {},
"outputs": [
{
"ename": "KeyError",
"evalue": "EqualityWithMutable('b', 1)",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mKeyError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m<ipython-input-16-f4c47ae2bdcf>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mdct\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mw_immut_1\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;31mKeyError\u001b[0m: EqualityWithMutable('b', 1)"
]
}
],
"source": [
"dct[w_immut_1]"
]
},
{
"cell_type": "markdown",
"id": "4d6035a4",
"metadata": {},
"source": [
"What has happened here? Well, when we made `dct`, our object had certain hash value. That hash tells the `dict` the name of the bucket to store our result.\n",
"\n",
"When we changed the immutable part, we changed the value of the hash. When we asked the `dict` to look it up, it looked in a different bucket -- one that didn't exist.\n",
"\n",
"We were able to print `dct`, and the `repr` of the key is there and updated. The data is still there, but it can't be extracted by key. (However, we couldn't use IPython's default pretty-print -- we would have gotten the same `KeyError`!)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.10"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment