Skip to content

Instantly share code, notes, and snippets.

@d2207197
Last active August 29, 2015 14:18
Show Gist options
  • Save d2207197/1c1cda1387d97a5e24b5 to your computer and use it in GitHub Desktop.
Save d2207197/1c1cda1387d97a5e24b5 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 1, 2, 3 層 dictionary 效能測試"
]
},
{
"cell_type": "code",
"execution_count": 125,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1 loops, best of 3: 8.4 s per loop\n"
]
}
],
"source": [
"%timeit d = { k:k**2 for k in range(10000000)}"
]
},
{
"cell_type": "code",
"execution_count": 126,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1 loops, best of 3: 9.38 s per loop\n"
]
}
],
"source": [
"%timeit d = { (n1,n2): n1*n2 for n1 in range(10000) for n2 in range(1000)}"
]
},
{
"cell_type": "code",
"execution_count": 127,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1 loops, best of 3: 5.18 s per loop\n"
]
}
],
"source": [
"%timeit d = {n1:{ n2: n1*n2 for n2 in range(1000)} for n1 in range(10000)}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"切成兩層 dictionary 時,效能有顯著提升。"
]
},
{
"cell_type": "code",
"execution_count": 128,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1 loops, best of 3: 6.77 s per loop\n"
]
}
],
"source": [
"%timeit d = {n1:{ n2: {n3: (n1*100+n2)*n3 for n3 in range(1000)} for n2 in range(100)} for n1 in range(100)}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"切到三層時稍微變慢了點,應該是 100 個 keys 跟 1000 個 keys 效率差異不大。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 建立多層 dictionary \n",
"\n",
"通常使用 defaultdict 來減少初始化的麻煩"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### defaultdict\n",
"\n",
"提供具有預設值的 dictionary"
]
},
{
"cell_type": "code",
"execution_count": 129,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"from collections import defaultdict"
]
},
{
"cell_type": "code",
"execution_count": 131,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"0"
]
},
"execution_count": 131,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"intd = defaultdict(int)\n",
"intd['hello']"
]
},
{
"cell_type": "code",
"execution_count": 132,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"1"
]
},
"execution_count": 132,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"intd['hey'] += 1 # 可以直接增加空白 key 的值\n",
"intd['hey']"
]
},
{
"cell_type": "code",
"execution_count": 133,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"0.0"
]
},
"execution_count": 133,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"floatd = defaultdict(float)\n",
"floatd['hello']"
]
},
{
"cell_type": "code",
"execution_count": 137,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"1"
]
},
"execution_count": 137,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"_1dict = defaultdict(lambda: 1) # 預設值為 1\n",
"_1dict['hello']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 建立兩層 dictionary\n",
"\n",
"使用 dict 作為 defaultdict 的 default factory"
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1 loops, best of 3: 5.14 s per loop\n"
]
}
],
"source": [
"%%timeit \n",
"d = defaultdict(dict)\n",
"for n1 in range(10000):\n",
" for n2 in range(1000):\n",
" d[n1][n2] = n1*n2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"如需要預設值為 0,需連接兩個 defaultdict"
]
},
{
"cell_type": "code",
"execution_count": 134,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"0"
]
},
"execution_count": 134,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"int_2lvl_dict = defaultdict(lambda: defaultdict(int))\n",
"int_2lvl_dict['hey']['hello']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 三層 dictionary \n",
"可以使用兩個 defaultdict\n",
"\n",
"簡單好懂的寫法是,但會有無法pickle 的缺點\n",
"```python\n",
"d = defaultdict(lambda: defaultdict(dict))\n",
"```\n",
"\n",
"配合 partial 就可以 pickle,而且速度似乎會稍微快一點點\n",
"```python\n",
"d = defaultdict(partial(defaultdict, dict))\n",
"```\n",
"\n",
"如果需要預設值為 0 就須連結三層 defaultdict\n",
"\n",
"```python\n",
"d = defaultdict(lambda: defaultdict(lambda: int))\n",
"d = defaultdict(partial(defaultdict, partial(defualtdict, int)))\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 61,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"from functools import partial"
]
},
{
"cell_type": "code",
"execution_count": 63,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1 loops, best of 3: 6.53 s per loop\n"
]
}
],
"source": [
"%%timeit\n",
"d = defaultdict(partial(defaultdict, dict))\n",
"for n1 in range(100):\n",
" for n2 in range(100):\n",
" for n3 in range(1000):\n",
" d[n1][n2][n3] = (n1*100+n2)*n3"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Counter\n",
"\n",
"`Counter` 本身就是 dictionary ,不過預設值都是 0,很類似 `defaultdict(int)` ,不過有幾點不同\n",
"\n",
"- 有 `most_common()` method,可以直接取出 count 數量前幾名的 key。\n",
"- `__getitem__(key)` 後不會產生 key"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[('stop', 1000), (('hey', 'yo'), 7)]\n",
"Counter({'stop': 1000, ('hey', 'yo'): 7, 'hello': 3, 'world': 1})\n"
]
}
],
"source": [
"from collections import Counter\n",
"word_cnt = Counter()\n",
"word_cnt['hello'] +=3\n",
"word_cnt['world'] +=1\n",
"word_cnt['stop'] = 1000\n",
"word_cnt[('hey', 'yo')] = 7\n",
"print word_cnt.most_common(2) # 列出 count 前幾名的字,依數量排序\n",
"print word_cnt"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"defaultdict(<type 'int'>, {'hey': 0})\n",
"Counter()\n",
"dint has 'hey'\n"
]
}
],
"source": [
"dint = defaultdict(int)\n",
"cnt = Counter()\n",
"\n",
"dint['hey']\n",
"cnt['hey']\n",
"\n",
"print dint\n",
"print cnt\n",
"\n",
"for k in dint:\n",
" print 'dint has', repr(k)\n",
"for k in cnt:\n",
" print 'cnt has', repr(k)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 三層以上的 dictionary + counter"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"_1levelcounter = Counter()\n",
"_2levelcounter = defaultdict(Counter)\n",
"_3levelcounter = defaultdict(partial(defaultdict, Counter))\n",
"_4levelcounter = defaultdict(partial(defaultdict, partial(defaultdict, Counter)))\n",
"_5levelcounter = defaultdict(partial(defaultdict, partial(defaultdict, partial(defaultdict, Counter))))"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"_1levelcounter['hello'] += 10\n",
"_2levelcounter['hello']['world'] += 10\n",
"_3levelcounter['hello']['world']['hey'] += 10\n",
"_4levelcounter['hello']['world']['hey'][-5] += 10\n",
"_5levelcounter['hello']['world']['hey'][-5][('foo', 'bar')] += 10"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"10\n",
"10\n",
"10\n",
"10\n",
"10\n"
]
}
],
"source": [
"print _1levelcounter['hello']\n",
"print _2levelcounter['hello']['world']\n",
"print _3levelcounter['hello']['world']['hey']\n",
"print _4levelcounter['hello']['world']['hey'][-5]\n",
"print _5levelcounter['hello']['world']['hey'][-5][('foo', 'bar')]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# n level counter generator"
]
},
{
"cell_type": "code",
"execution_count": 118,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"def n_lvl_Counter(n):\n",
" _n_lvl_Counter = lambda n: partial(defaultdict, _n_lvl_Counter(n-1)) if n > 1 else Counter\n",
" return _n_lvl_Counter(n)()\n"
]
},
{
"cell_type": "code",
"execution_count": 121,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"0"
]
},
"execution_count": 121,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"_1lvl = n_lvl_Counter(1)\n",
"_1lvl['hey']"
]
},
{
"cell_type": "code",
"execution_count": 122,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"0"
]
},
"execution_count": 122,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"_2lvl = n_lvl_Counter(2)\n",
"_2lvl['hey']['hey']"
]
},
{
"cell_type": "code",
"execution_count": 115,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"0"
]
},
"execution_count": 115,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"_3lvl = n_lvl_Counter(3)\n",
"_3lvl['hey']['hey']['hey']"
]
},
{
"cell_type": "code",
"execution_count": 124,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"defaultdict(<functools.partial object at 0x10bc4cc58>, {})"
]
},
"execution_count": 124,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"n_lvl_Counter(3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 低效率的 in dict.keys() 操作"
]
},
{
"cell_type": "code",
"execution_count": 166,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"d = { k:k**2 for k in range(1000000)}"
]
},
{
"cell_type": "code",
"execution_count": 167,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 115 ms, sys: 2.8 ms, total: 118 ms\n",
"Wall time: 123 ms\n"
]
},
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 167,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%time 9999 in d.keys()"
]
},
{
"cell_type": "code",
"execution_count": 168,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 1.66 ms, sys: 44 µs, total: 1.7 ms\n",
"Wall time: 2.39 ms\n"
]
},
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 168,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%time 9999 in d.iterkeys()"
]
},
{
"cell_type": "code",
"execution_count": 170,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 6 µs, sys: 1 µs, total: 7 µs\n",
"Wall time: 10 µs\n"
]
},
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 170,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%time 9999 in d.viewkeys()"
]
},
{
"cell_type": "code",
"execution_count": 169,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 3 µs, sys: 0 ns, total: 3 µs\n",
"Wall time: 7.87 µs\n"
]
},
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 169,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%time 9999 in d"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"速度約是 \n",
" \n",
"> `in d` > `in d.viewkeys()` >>> `in d.iterkeys()` >> `in d.keys()`"
]
},
{
"cell_type": "code",
"execution_count": 156,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"https://gist.github.com/1c1cda1387d97a5e24b5\r\n"
]
}
],
"source": [
"!gist -u 1c1cda1387d97a5e24b5 multilevel\\ dictionary.ipynb"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.9"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment