Skip to content

Instantly share code, notes, and snippets.

@refraction-ray
Last active March 29, 2019 02:29
Show Gist options
  • Save refraction-ray/fbb6cb3b527d2ca49606b83cc6427e0e to your computer and use it in GitHub Desktop.
Save refraction-ray/fbb6cb3b527d2ca49606b83cc6427e0e to your computer and use it in GitHub Desktop.
A minimal tutorial on collections module in python
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Demo on collections module\n",
"**Reference:** [python doc](https://docs.python.org/3.6/library/collections.html)"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import collections"
]
},
{
"cell_type": "markdown",
"metadata": {
"heading_collapsed": true
},
"source": [
"## ChainMap"
]
},
{
"cell_type": "markdown",
"metadata": {
"hidden": true
},
"source": [
"ChainMap is a faster way to merge some dicts, compared to `dict_old.update(dict_new)`"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"hidden": true
},
"outputs": [
{
"data": {
"text/plain": [
"[{'Bob': False, 'John': True}, {'Alice': True}]"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dict1 = {\"John\": True, \"Bob\": False}\n",
"dict2 = {\"Alice\": True}\n",
"cm = collections.ChainMap(dict1, dict2) # this is first than merging two dicts by updates\n",
"cm.maps"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"hidden": true
},
"outputs": [
{
"data": {
"text/plain": [
"[{'Bob': False, 'John': True}, {'Alice': True, 'Lucy': False}]"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dict2[\"Lucy\"]=False\n",
"cm.maps ## the edit on the input can be reflected by chainmap automatically"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"hidden": true
},
"outputs": [
{
"data": {
"text/plain": [
"[{'Bob': False, 'John': True, 'Mike': False}, {'Alice': True, 'Lucy': False}]"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm[\"Mike\"]=False\n",
"cm.maps ## all the add on cm itself is reflected on the first map"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"hidden": true
},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm['Alice'] ## the search is go through the maps list, slower than one dict! "
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"hidden": true
},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dict2['John']=False\n",
"cm['John'] # for the same key, the previous one will override the latter one when query"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"hidden": true
},
"outputs": [],
"source": [
"## a deep search version of ChainMap, the ordinary chainmap only update the first dict in the map list\n",
"## below is a hack subclass where update can also happen in other dicts\n",
"\n",
"class DeepChainMap(ChainMap):\n",
" 'Variant of ChainMap that allows direct updates to inner scopes'\n",
"\n",
" def __setitem__(self, key, value):\n",
" for mapping in self.maps:\n",
" if key in mapping:\n",
" mapping[key] = value\n",
" return\n",
" self.maps[0][key] = value\n",
"\n",
" def __delitem__(self, key):\n",
" for mapping in self.maps:\n",
" if key in mapping:\n",
" del mapping[key]\n",
" return\n",
" raise KeyError(key)"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"hidden": true
},
"outputs": [
{
"data": {
"text/plain": [
"[{},\n",
" {'Bob': False, 'John': True, 'Mike': False},\n",
" {'Alice': True, 'John': False, 'Lucy': False}]"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm_c = cm.new_child() ## support for nested context\n",
"cm_c.maps "
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"hidden": true
},
"outputs": [
{
"data": {
"text/plain": [
"ChainMap({'John': True, 'Bob': False, 'Mike': False}, {'Alice': True, 'Lucy': False, 'John': False})"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm_c.parents"
]
},
{
"cell_type": "markdown",
"metadata": {
"heading_collapsed": true
},
"source": [
"## Counter"
]
},
{
"cell_type": "markdown",
"metadata": {
"hidden": true
},
"source": [
"counter is ... just a counter, similar with the implmentation via dict, actually a subclass of dict which can directly count iterable input"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"hidden": true
},
"outputs": [
{
"data": {
"text/plain": [
"Counter({1: 3, 2: 2, 3: 1})"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ct = collections.Counter()\n",
"for w in [1,2,3,1,2,1]:\n",
" ct[w] += 1\n",
"ct"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"hidden": true
},
"outputs": [
{
"data": {
"text/plain": [
"Counter({1: 3, 2: 2, 3: 1})"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"## better way to count\n",
"collections.Counter([1,2,3,1,2,1])"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"hidden": true
},
"outputs": [
{
"data": {
"text/plain": [
"Counter({' ': 1,\n",
" '!': 1,\n",
" 'd': 1,\n",
" 'e': 1,\n",
" 'h': 1,\n",
" 'l': 3,\n",
" 'o': 2,\n",
" 'r': 1,\n",
" 'w': 1})"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"wd=collections.Counter(\"hello world!\")\n",
"wd"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"hidden": true
},
"outputs": [
{
"data": {
"text/plain": [
"0"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"wd[\"z\"] # for missing value, return 0 instead of rasing error"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"hidden": true
},
"outputs": [
{
"data": {
"text/plain": [
"['h', 'e', 'l', 'l', 'l', 'o', 'o', ' ', 'w', 'r', 'd', '!']"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"list(wd.elements()) ## same things together"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"hidden": true
},
"outputs": [
{
"data": {
"text/plain": [
"[('l', 3), ('o', 2)]"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"wd.most_common(2) ## the 2 most common char"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"hidden": true
},
"outputs": [
{
"data": {
"text/plain": [
"Counter({' ': 1,\n",
" '!': 1,\n",
" 'd': 1,\n",
" 'e': 1,\n",
" 'h': 1,\n",
" 'l': 13,\n",
" 'o': 2,\n",
" 'r': 1,\n",
" 'w': 1})"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"wd2 = {\"l\":10}\n",
"wd.update(wd2)\n",
"wd # the value is added instead of updated by dict.update"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"hidden": true
},
"outputs": [
{
"data": {
"text/plain": [
"Counter({'l': 10})"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"wd2 = collections.Counter(wd2)\n",
"wd2 ## counter can also directly initialized with a dict"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"hidden": true
},
"outputs": [
{
"data": {
"text/plain": [
"(Counter({' ': 1,\n",
" '!': 1,\n",
" 'd': 1,\n",
" 'e': 1,\n",
" 'h': 1,\n",
" 'l': 23,\n",
" 'o': 2,\n",
" 'r': 1,\n",
" 'w': 1}),\n",
" Counter({' ': 1,\n",
" '!': 1,\n",
" 'd': 1,\n",
" 'e': 1,\n",
" 'h': 1,\n",
" 'l': 3,\n",
" 'o': 2,\n",
" 'r': 1,\n",
" 'w': 1}),\n",
" Counter({'l': 10}),\n",
" Counter({' ': 1,\n",
" '!': 1,\n",
" 'd': 1,\n",
" 'e': 1,\n",
" 'h': 1,\n",
" 'l': 13,\n",
" 'o': 2,\n",
" 'r': 1,\n",
" 'w': 1}))"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"wd+wd2, wd-wd2, wd&wd2, wd|wd2 ## overload for +, - (keep only positive ones), min, max"
]
},
{
"cell_type": "markdown",
"metadata": {
"heading_collapsed": true
},
"source": [
"## deque"
]
},
{
"cell_type": "markdown",
"metadata": {
"hidden": true
},
"source": [
"deque is a double-ended queue, it is thread safe, and O(1) for pop and append on both side of the queue"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {
"hidden": true
},
"outputs": [],
"source": [
"dq = collections.deque([1,2,3,4],maxlen=5) # the maxlen gives the restriction on the queue, new append must \n",
"# come with pop from the other end when maxlen is reached "
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {
"hidden": true
},
"outputs": [
{
"data": {
"text/plain": [
"[1, 2, 3, 4, 5]"
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dq.append(5)\n",
"list(dq)"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {
"hidden": true
},
"outputs": [
{
"data": {
"text/plain": [
"[0, 1, 2, 3, 4]"
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dq.appendleft(0)\n",
"list(dq)"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {
"hidden": true
},
"outputs": [
{
"data": {
"text/plain": [
"1"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dq.count(1) # count how many 1 in the queue"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {
"hidden": true
},
"outputs": [
{
"data": {
"text/plain": [
"[3, 4, 6, 7, 8]"
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dq.extend([6,7,8])\n",
"list(dq)"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {
"hidden": true
},
"outputs": [
{
"data": {
"text/plain": [
"3"
]
},
"execution_count": 39,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dq.popleft()"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {
"hidden": true
},
"outputs": [
{
"data": {
"text/plain": [
"[4, 6, 7, 8]"
]
},
"execution_count": 40,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"list(dq)"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {
"hidden": true
},
"outputs": [],
"source": [
"dq.rotate(2)"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {
"hidden": true
},
"outputs": [
{
"data": {
"text/plain": [
"[7, 8, 4, 6]"
]
},
"execution_count": 42,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"list(dq) ## rotate the linked list 2 steps to the right"
]
},
{
"cell_type": "markdown",
"metadata": {
"heading_collapsed": true
},
"source": [
"## defaultdict"
]
},
{
"cell_type": "markdown",
"metadata": {
"hidden": true
},
"source": [
"of course, defaultdict is a subclass of dict. with a new input for constructor called default_factory. \n",
"It is designed for a default value of the dict when key is missing. Namely everytime the accessed key is missing,\n",
"the default_factory() is called."
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {
"hidden": true
},
"outputs": [
{
"data": {
"text/plain": [
"defaultdict(list, {'blue': [2, 4], 'red': [1], 'yellow': [1, 3]})"
]
},
"execution_count": 45,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]\n",
"d = collections.defaultdict(list)\n",
"[d[k].append(v) for k,v in s]\n",
"d"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {
"hidden": true
},
"outputs": [
{
"data": {
"text/plain": [
"[]"
]
},
"execution_count": 46,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"## to understand the above code, just recall that\n",
"list() ## the is the function called each time a missing key is accessed"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {
"hidden": true
},
"outputs": [
{
"data": {
"text/plain": [
"{'blue': [2, 4], 'red': [1], 'yellow': [1, 3]}"
]
},
"execution_count": 47,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"## the usage above is equivalent to\n",
"d = {}\n",
"for k, v in s:\n",
" d.setdefault(k, []).append(v)\n",
"d"
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {
"hidden": true
},
"outputs": [
{
"data": {
"text/plain": [
"(0, set(), ())"
]
},
"execution_count": 49,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"int(),set(),tuple() ## lots of interesting default factory can be used and explored!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## namedtuple"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"assign name to tuple, which is better understood compared to access by index pos"
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {},
"outputs": [],
"source": [
"Point = collections.namedtuple('Point','x y') # for the field name ['x', 'y'], 'x y', 'x,y' are all legal formats\n",
"p=Point(11,22)"
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(11, 22)"
]
},
"execution_count": 53,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"p.x, p.y"
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(11, 22)"
]
},
"execution_count": 54,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"p[0], p[1] ## the old fashioned way in tuple also works"
]
},
{
"cell_type": "code",
"execution_count": 57,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"33"
]
},
"execution_count": 57,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"p = p._replace(x=33) ## tuple finally can be changed\n",
"p.x"
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"('x', 'y')"
]
},
"execution_count": 59,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"p._fields ## view field names"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## OrderedDict"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A dict in python is implemented by hash to obtain O(1) speed with the cost that the keys cannot be sorted.\n",
"An orderdict, well, is a dict remembering order. (order of keys that first inserted)"
]
},
{
"cell_type": "code",
"execution_count": 61,
"metadata": {},
"outputs": [],
"source": [
"od = collections.OrderedDict([(1,2),(3,4)])"
]
},
{
"cell_type": "code",
"execution_count": 64,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(2, 4)"
]
},
"execution_count": 64,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"od[1], od[3]"
]
},
{
"cell_type": "code",
"execution_count": 67,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(3, 4)"
]
},
"execution_count": 67,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"od.popitem(last=True) # pop in LIFO fashion if last is True (by default), otherwise pop in FIFO order"
]
},
{
"cell_type": "code",
"execution_count": 68,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"OrderedDict([(1, 2)])"
]
},
"execution_count": 68,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"od"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment