Skip to content

Instantly share code, notes, and snippets.

@cloga
Last active January 3, 2016 00:19
Show Gist options
  • Save cloga/8382454 to your computer and use it in GitHub Desktop.
Save cloga/8382454 to your computer and use it in GitHub Desktop.
Python多进程模块Multiprocessing介绍
{
"metadata": {
"name": ""
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[Multiprocessing](http://docs.python.org/2/library/multiprocessing.html)\u662fPython\u7684\u4e00\u4e2a\u6807\u51c6\u5e93\uff0c\u901a\u8fc7\u8fd9\u4e2a\u5e93\uff0c\u53ef\u4ee5\u5b9e\u73b0\u5e76\u884c\u7f16\u7a0b\uff0c\u66f4\u6709\u6548\u7684\u5229\u7528\u591a\u6838CPU\u3002\u7531\u4e8ePython\u7684GIL\u7684\u9650\u5236\uff0c\u9ed8\u8ba4\u60c5\u51b5\u4e0bPython\u65e0\u6cd5\u6709\u6548\u5229\u7528\u591a\u6838\u3002\u901a\u8fc7Multiprocessing\uff0c\u53ef\u4ee5\u521b\u5efa\u591a\u4e2a\u5b50\u7ebf\u7a0b\uff0c\u4ece\u800c\u66f4\u52a0\u6709\u6548\u7684\u5229\u7528\u591a\u6838\u3002\u8fd9\u7bc7\u6587\u4ef6\u4f1a\u4ecb\u7ecd\u4e00\u4e0b\u4f7f\u7528Multiprocessing\u7684\u7ebf\u7a0b\u6c60\uff08Pool\uff09\u5b9e\u73b0\u7b80\u5355\u7684\u5e76\u884c\u7f16\u7a0b\u3002\n",
"\n",
"Multiprocessing\u7c7b\u63d0\u4f9b\u4e86Pool\u5bf9\u8c61\uff0c\u901a\u8fc7\u8fdb\u7a0b\u6c60\u5bf9\u8c61\u6765\u7ba1\u7406\u548c\u521b\u5efa\u591a\u4e2a\u8fdb\u7a0b\u7684worker\uff0c\u5e76\u6536\u96c6\u8fd9\u4e9bWorker\u8fd4\u56de\u7684\u7ed3\u679c\u3002\n",
"\n",
"#\u7b80\u5355\u4efb\u52a1\u7684\u591a\u8fdb\u7a0b\u7f16\u7a0b"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import multiprocessing as mul\n",
"import os\n",
"from math import factorial"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 1
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\u5f15\u5165multiprocessiing\uff0c\u5f15\u5165os\u6a21\u5757\u7528\u4e8e\u67e5\u770b\u8fdb\u7a0bid\uff0c\u5f15\u5165\u9636\u4e58\u8ba1\u7b97\uff0c\u7528\u4e8e\u6d4b\u8bd5\u7b80\u5355\u4efb\u52a1\u4e0b\u7684\u591a\u8fdb\u7a0b\u7f16\u7a0b\u3002\n"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"pool = mul.Pool()\n",
"mul.cpu_count()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 12,
"text": [
"4"
]
}
],
"prompt_number": 12
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\u4e0a\u9762\u7684\u4f8b\u5b50\u5c31\u5b9e\u4f8b\u5316\u4e86\u4e00\u4e2a\u8fdb\u7a0b\u6c60\u3002Pool\u63a5\u53d7\u8fdb\u7a0b\u6570\u4f5c\u4e3a\u53c2\u6570\uff0c\u9ed8\u8ba4\u60c5\u51b5\u4e0b\uff0c\u4f1a\u4f7f\u7528cpu_count()\u7684\u503c\u4f5c\u4e3a\u8fdb\u7a0b\u7684\u9ed8\u8ba4\u503c\u3002\u6bd4\u5982\u6211\u7684\u7535\u8111\u7684\u8bdd\uff0cpool = mul.Pool()\u7b49\u4ef7\u4e8epool = mul.Pool(4)\n",
"\n",
"\u8ba9\u6211\u4eec\u8ba1\u7b971-100\u7684\u9636\u4e58\uff0c\u8fd4\u56de\u4e00\u4e2alist\uff0c\u7528\u4e8e\u6d4b\u8bd5\u591a\u8fdb\u7a0b\u7f16\u7a0b\u7684\u6548\u679c\u3002\u9996\u5148\u5b9a\u4e49\u4e00\u4e2a\u9636\u4e58\u7684\u51fd\u6570\u3002"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def get_factorial(num, pid = 0):\n",
" if pid:\n",
" print 'pid is', os.getpid()\n",
" return factorial(num)"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 13
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\u4e3a\u4e86\u663e\u793a\u5f53\u524d\u6240\u4f7f\u7528\u7684\u8fdb\u7a0b\uff0c\u8fd9\u91cc\u6211\u4eec\u9700\u8981\u4f7f\u7528os.getpid()\u83b7\u5f97\u8fdb\u7a0bid\u3002"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"timeit get_factorial(100, pid=0)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"100000 loops, best of 3: 29.1 \u00b5s per loop\n"
]
}
],
"prompt_number": 4
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"f_10 = get_factorial(10, pid=1)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"pid is 25566\n"
]
}
],
"prompt_number": 5
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\u53ef\u4ee5\u770b\u5230\u5f53\u524d\u8fdb\u7a0b\u7684PID\u3002"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def f_list_serial(num, pid=0):\n",
" results = []\n",
" for n in range(1,num + 1):\n",
" results.append(get_factorial(n,pid=pid))\n",
" return results\n",
"\n",
"results = f_list_serial(5, pid=1)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"pid is 27047\n",
"pid is 27047\n",
"pid is 27047\n",
"pid is 27047\n",
"pid is 27047\n"
]
}
],
"prompt_number": 14
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\u5b9a\u4e49\u4e2a\u4e32\u884c\u8ba1\u7b97\u7684\u51fd\u6570\uff0c\u53ef\u4ee5\u770b\u5230pid\u90fd\u662f\u4e00\u4e2a\uff0c\u8bf4\u660e\u8ba1\u7b97\u662f\u5728\u4e00\u4e2a\u8fdb\u7a0b\u4e2d\u987a\u5e8f\u8fdb\u884c\u7684\u3002"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"timeit f_list_serial(100, pid=0)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"1000 loops, best of 3: 594 \u00b5s per loop\n"
]
}
],
"prompt_number": 15
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\u5982\u679c\u4e0d\u4f7f\u7528\u5e76\u884c\u8ba1\u7b97\u7684\u8bdd\uff0c\u8ba1\u7b971-100\u7684\u9636\u4e58\u9700\u8981\u7684\u65f6\u95f4\u662f300-400\u00b5s\u5de6\u53f3\u3002\u7531\u4e8e\u8ba1\u7b97\u6bcf\u4e2a\u6570\u5b57\u7684\u9636\u4e58\u4efb\u52a1\u4e4b\u95f4\u90fd\u662f\u72ec\u7acb\u7684\uff0c\u56e0\u6b64\u53ef\u4ee5\u4f7f\u7528\u7b80\u5355\u7684\u8fdb\u7a0b\u6c60\u6765\u8fdb\u7a0b\u5e76\u884c\u8ba1\u7b97\u3002\u5bf9\u4e8e\u591a\u4e2a\u4efb\u52a1\u4e4b\u95f4\u76f8\u4e92\u4f9d\u8d56\u6216\u8005\u9700\u8981\u5171\u4eab\u4fe1\u606f\u7684\u60c5\u51b5\u4e0d\u5728\u672c\u6587\u7684\u8ba8\u8bba\u4e4b\u4e2d\u3002\u5c06\u524d\u9762\u7684\u4e32\u884c\u8ba1\u7b97\u6539\u6210\u7528\u8fdb\u7a0b\u6c60\u6765\u8ba1\u7b97\u3002\n",
"\n",
"Multiprocessing\u63d0\u4f9b\u4e86apply\uff0capply_async\uff0cmap\u548cmap_async\u7b49\u591a\u79cd\u65b9\u6cd5\uff0c\u7528\u4e8e\u7ebf\u7a0b\u6c60\u7684\u8ba1\u7b97\u3002\u5176\u4e2d\u7684map\u548capply\u4e0e\u6807\u51c6\u6a21\u5757\u4e2d\u65b9\u6cd5\u7528\u6cd5\u7c7b\u4f3c\uff0c\u6240\u4e0d\u540c\u7684\u662fmap\u53ea\u63a5\u53d7\u4e00\u4e2a\u53c2\u6570\uff0c\u5982\u679c\u9700\u8981\u63a5\u53d7\u591a\u4e2a\u53c2\u6570\u5219\u6700\u597d\u4f7f\u7528apply\uff0c\u800capply_async\u548cmap_async\u5219\u662fmap\u548capply\u7684\u5f02\u6b65\u65b9\u6cd5\uff0c\u5176\u7ed3\u679c\u662f\u5f02\u6b65\u8fd4\u56de\u7684AsyncResult\u7c7b\u578b\u7684\u6570\u636e\u3002"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def f_list_para_apply_async(num,pid=0,pool=None):\n",
" pool = mul.Pool()\n",
" results_list = []\n",
" results = []\n",
" for n in range(1,num+1):\n",
" results_list.append(\n",
" pool.apply_async(get_factorial, args=(n,pid)))\n",
" pool.close()\n",
" pool.join()\n",
" for result in results_list:\n",
" results.append(result.get())\n",
" return results"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 16
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"r = f_list_para_apply_async(10,pid=1)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"pid is 27131\n",
"pid is 27133\n",
"pid is 27132\n",
"pid is 27134\n",
"pid is 27131\n",
"pid is 27133\n",
"pid is 27132\n",
"pid is 27131\n",
"pid is 27133\n",
"pid is 27131\n"
]
}
],
"prompt_number": 17
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\u5982\u4e0a\u6240\u793a\uff0c\u4f7f\u7528multiprocessing\u7684\u8fdb\u7a0b\u6c60\u540e\u53ef\u4ee5\u770b\u5230\u4e0d\u540c\u7684\u4efb\u52a1\u662f\u5728\u4e0d\u540c\u7684\u8fdb\u7a0b\u4e2d\u8fdb\u7a0b\u3002\n",
"\n",
"\u5173\u4e8e\u4e0a\u9762\u7684\u51fd\u6570\u6709\u51e0\u70b9\u8bf4\u660e\uff1a\n",
"\n",
"- \u5728\u4e3b\u7a0b\u5e8f\u4e2d\u5b9e\u4f8b\u5316pool\u4f1a\u62a5\u9519\uff0c\u800c\u5728\u51fd\u6570\u4e2d\u5b9e\u4f8b\u5316\u5219\u53ef\u4ee5\u6b63\u5e38\u8fd0\u884c\uff0c\u8fd9\u4e00\u70b9\u5728Multiprocessing\u7684\u5b98\u65b9\u6587\u6863\u4e2d\u4e5f\u6709\u8bf4\u660e\uff0c\u5b98\u65b9\u6587\u6863\u7684\u8bf4\u6cd5\u662f\u8fd9\u4e2a\u5305\u9700\u8981__main__\u6a21\u5757\u88ab\u5b50\u6a21\u5757\u5bfc\u5165\n",
"- apply_async\u652f\u6301\u591a\u4e2a\u53c2\u6570\uff0c\u5982\u679c\u662f\u4f4d\u7f6e\u53c2\u6570\uff0c\u5219\u53ef\u4ee5\u4f7f\u7528args\u53c2\u6570\uff0c\u503c\u4e3atuple\uff0c\u5982\u679c\u662f\u5173\u952e\u5b57\u53c2\u6570\uff0c\u5219\u53ef\u4ee5\u4f7f\u7528kwds\u53c2\u6570\uff0c\u503c\u4e3a\u5b57\u5178\uff0c\u4e5f\u53ef\u4ee5\u540c\u65f6\u4f7f\u7528\u4e8c\u8005\n",
"- \u5f53\u6240\u6709\u4efb\u52a1\u90fd\u6267\u884c\u5b8c\u4e4b\u540e\u4e00\u5b9a\u8981\u8bb0\u5f97\u7528close()\u548cjoin()\u56de\u6536\u8fdb\u7a0b\uff0c\u4e0d\u7136\u8fd9\u4e9b\u8fdb\u7a0b\u4f1a\u53d8\u6210\u50f5\u5c38\u8fdb\u7a0b\uff0c\u4f1a\u9020\u6210\u6253\u5f00\u6587\u4ef6\u8fc7\u591a\u7684\u9519\u8bef\n",
"- \u6bcf\u6b21apply_async\u8fd4\u56de\u7684\u7ed3\u679c\u90fd\u662fAsyncResult\u5bf9\u8c61\uff0c\u9700\u8981\u901a\u8fc7get()\u65b9\u6cd5\u83b7\u5f97\u5176\u4e2d\u7684\u503c\u3002\u7531\u4e8eget()\u662f\u963b\u585e\u7684\u65b9\u5f0f\u5373\u540c\u6b65\u7684\u65b9\u5f0f\u5904\u7406\uff0c\u56e0\u6b64\uff0c\u5728\u6700\u540e\u7edf\u4e00\u5904\u7406\u8fd9\u4e9b\u7ed3\u679c\u5373\u53ef"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"timeit r = f_list_para_apply_async(100)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"10 loops, best of 3: 156 ms per loop\n"
]
}
],
"prompt_number": 10
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\u901a\u8fc7timeit\u51fd\u6570\u53ef\u4ee5\u770b\u5230\uff0c\u4f7f\u7528\u9ed8\u8ba4\u76844\u4e2a\u8fdb\u7a0b\uff08\u57fa\u4e8e\u6211\u7535\u8111\u76ee\u524d\u7684\u914d\u7f6e\uff09\u8ba1\u7b97\u65f6\u95f4\u6709\u660e\u663e\u7684\u589e\u52a0\u3002"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"timeit r = f_list_para_apply_async(100,pool=50)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"1 loops, best of 3: 165 ms per loop\n"
]
}
],
"prompt_number": 11
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"timeit r = f_list_para_apply_async(100,pool=1)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"10 loops, best of 3: 152 ms per loop\n"
]
}
],
"prompt_number": 12
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\u5c1d\u8bd5\u5c06\u8fdb\u7a0b\u589e\u52a0\u5230100\u6216\u8005\u5c06\u8fdb\u7a0b\u51cf\u5c11\u52301\u6ca1\u6709\u770b\u5230\u65f6\u95f4\u7684\u660e\u663e\u53d8\u5316\uff0c\u8fd9\u53ef\u80fd\u662f\u56e0\u4e3a\u8fd9\u4e2a\u4efb\u52a1\u8fc7\u4e8e\u7b80\u5355\uff0c\u4f7f\u7528\u591a\u8fdb\u7a0b\u66f4\u591a\u7684\u8d44\u6e90\u6d6a\u8d39\u5728\u8fdb\u7a0b\u5207\u6362\u4e0a\u3002"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def f_list_para_map_asyns(num, pool=None):\n",
" results_list = []\n",
" pool = mul.Pool(pool)\n",
" result = pool.map_async(get_factorial,range(1,num+1))\n",
" pool.close()\n",
" pool.join()\n",
" for result in results_list:\n",
" results.append(result.get())\n",
" return result"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 18
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"timeit r = f_list_para_map_asyns(100)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"10 loops, best of 3: 144 ms per loop\n"
]
}
],
"prompt_number": 19
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\u4f7f\u7528map\u7684\u5f02\u6b65\u5f62\u5f0f\u83b7\u5f97\u8ba1\u7b97\u901f\u5ea6\u7684\u63d0\u5347\u4e0eapply_async\u76f8\u8fd1\uff0c\u4e0d\u8fc7\u7531\u4e8emap_async\u53ea\u652f\u6301\u4e00\u4e2a\u53c2\u6570\u7528\u9014\u5e94\u8be5\u6ca1\u6709apply\u4e30\u5bcc\u3002"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def f_list_para_apply(num,pid=0,pool=None):\n",
" pool = mul.Pool()\n",
" results = []\n",
" for n in range(1,num + 1):\n",
" results.append(pool.apply(get_factorial, args=(n,pid)))\n",
" pool.close()\n",
" pool.join()\n",
" return results"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 20
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"r = f_list_para_apply(10,pid=1)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"pid is 27343\n",
"pid is 27344\n",
"pid is 27345\n",
"pid is 27346\n",
"pid is 27343\n",
"pid is 27344\n",
"pid is 27345\n",
"pid is 27346\n",
"pid is 27343\n",
"pid is 27344\n"
]
}
],
"prompt_number": 21
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\u4f7f\u7528map\u7684\u975e\u5f02\u6b65\u65b9\u5f0f\uff0c\u4ecd\u7136\u53ef\u4ee5\u770b\u5230pid\u7684\u53d8\u5316\uff0c\u770b\u6765\u6bcf\u4e2a\u4efb\u52a1\u90fd\u662f\u5728\u4e0d\u540c\u7684\u8fdb\u7a0b\u4e2d\u8fdb\u7a0b\uff0c\u53ea\u662f\u5404\u4e2a\u8fdb\u7a0b\u95f4\u4e0d\u662f\u5e76\u884c\u8fdb\u884c\u800c\u662f\u987a\u5e8f\u8fdb\u884c\uff0c\u5fc5\u987b\u8981\u7b49\u5230\u524d\u4e00\u4e2a\u8fdb\u7a0b\u8ba1\u7b97\u5b8c\u6210\u8fd4\u56de\u4e86\u7ed3\u679c\uff0c\u4e0b\u4e00\u4e2a\u8fdb\u7a0b\u624d\u4f1a\u5f00\u59cb\u8fdb\u884c\u8ba1\u7b97\u3002"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"timeit r = f_list_para_apply(100)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"10 loops, best of 3: 130 ms per loop\n"
]
}
],
"prompt_number": 22
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\u5bf9\u4e8e\u9636\u4e58\u8fd9\u4e2a\u4f8b\u5b50\uff0c\u5f02\u6b65\u65b9\u5f0f\u4e0e\u540c\u6b65\u65b9\u5f0f\u7684\u65f6\u95f4\u76f8\u8fd1\uff0c\u8fd9\u4e2a\u5e94\u8be5\u662f\u4e0e\u9009\u62e9\u7684\u4efb\u52a1\u6709\u5173\u3002"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def f_list_para_map(num):\n",
" results_list = []\n",
" pool = mul.Pool()\n",
" result = pool.map(get_factorial,range(1,num+1))\n",
" pool.close()\n",
" pool.join()\n",
" return result"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 18
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"r = f_list_para_apply(10,pid=1)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"pid is 26642\n",
"pid is 26643\n",
"pid is 26644\n",
"pid is 26645\n",
"pid is 26642\n",
"pid is 26643\n",
"pid is 26644\n",
"pid is 26645\n",
"pid is 26642\n",
"pid is 26643\n"
]
}
],
"prompt_number": 21
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"timeit r = f_list_para_map(100)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"10 loops, best of 3: 166 ms per loop\n"
]
}
],
"prompt_number": 22
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\u4f7f\u7528map\u7684\u540c\u6b65\u65b9\u5f0f\u7ed3\u679c\u4e0eapply\u7684\u540c\u6b65\u65b9\u5f0f\u76f8\u8fd1\u3002"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#\u590d\u6742\u4efb\u52a1\u7684\u591a\u8fdb\u7a0b\u7f16\u7a0b\n",
"\u4e0a\u9762\u7684\u4f8b\u5b50\u53ef\u4ee5\u770b\u5230\u5bf9\u4e8e\u6bd4\u8f83\u7b80\u5355\uff0c\u8ba1\u7b97\u8017\u65f6\u8f83\u5c11\u7684\u4efb\u52a1\uff0c\u4f7f\u7528\u591a\u8fdb\u7a0b\u5f97\u4e0d\u507f\u5931\uff0c\u65f6\u95f4\u4e3b\u8981\u6d88\u8017\u5728\u8fdb\u7a0b\u5207\u6362\u4e0a\u65e0\u6cd5\u63d0\u9ad8\u8ba1\u7b97\u6548\u7387\u3002\u518d\u6765\u770b\u4e00\u4e0b\u8017\u65f6\u8f83\u957f\u7684\u4efb\u52a1\u4f7f\u7528\u591a\u8fdb\u7a0b\u7f16\u7a0b\u7684\u6548\u679c\u5982\u4f55\u3002\n",
"\n",
"\u6211\u4eec\u7528\u6293\u53d6\u5fae\u535a\u8f6c\u53d1\u6570\u636e\u4f5c\u4e3a\u4f8b\u5b50\u3002\u5e16\u5b50\u7684\u4f8b\u5b50\u9009\u53d6\u4e86[\u5173\u4e8e\u9038\u592b\u697c\u7684\u4e00\u4e2a\u70ed\u95e8\u5fae\u535a](http://weibo.com/1496853872/AqSPp8Pyp)\uff0cmid\u4e3a\uff1a3664072912104801\uff0ctoken\u4f7f\u7528\u5fae\u535a\u7ed9\u5230\u7684\u6d4b\u8bd5token\u3002\u9996\u5148\u5b9a\u4e49\u6293\u53d6\u5fae\u535a\u8f6c\u53d1\u6570\u636e\u7684\u51fd\u6570\uff1a"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"token = '2.00Hk5I5B0XUlu4bde500a7f8FHAqIB'\n",
"token = '2.00Hk5I5B3mz1gEdaf1f0cb3bCXBvsB'\n",
"import json,urllib2,urllib\n",
"def get_repost_timeline(id, count=200, page=1, pid=0, **keys):\n",
" if pid:\n",
" print 'pid', os.getpid(),'start!'\n",
" query_args = {'id': id, 'count': count, 'page': page,\n",
" 'access_token': token}\n",
" query_args.update(keys)\n",
" url = 'https://api.weibo.com/2/statuses/repost_timeline.json?'\n",
" encoded_args = urllib.urlencode(query_args)\n",
" content = urllib2.urlopen(url + encoded_args).read()\n",
" if pid:\n",
" print 'pid', os.getpid(),'finished!'\n",
" return json.loads(content)"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 2
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\u5177\u4f53\u6587\u6863\u89c1[http://open.weibo.com/wiki/2/statuses/repost_timeline](http://open.weibo.com/wiki/2/statuses/repost_timeline)\u3002\u6309\u7167\u6587\u6863\u7684\u8bf4\u660e\uff0c\u8fd9\u4e2a\u63a5\u53e3\u53ea\u8fd4\u56de\u6700\u8fd12000\u6761\uff0c\u6bcf\u9875\u9ed8\u8ba4\u8fd4\u56de200\u6761\u7ed3\u679c\uff0c\u5219\u53ef\u4ee5\u5faa\u73af10\u6b21\u3002\n",
"\n"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"timeit get_repost_timeline(3664072912104801)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"1 loops, best of 3: 828 ms per loop\n"
]
}
],
"prompt_number": 3
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\u53ef\u4ee5\u770b\u5230\u83b7\u5f97200\u4e2a\u8f6c\u53d1\u9700\u8981\u7684\u65f6\u95f4\u6bd4\u8f83\u957f\uff0c\u67091s\u5de6\u53f3\u3002\n",
"\n",
"\u63a5\u4e0b\u6765\u518d\u5b9a\u4e49\u4e24\u4e2a\u51fd\u6570\u4e00\u4e2a\u662f\u4e32\u884c\u7684\u65b9\u5f0f\u6293\u53d62000\u6761\u8f6c\u53d1\uff0c\u4e00\u4e2a\u662f\u5f02\u6b65\u5e76\u884c\u65b9\u5f0f\u6293\u53d62000\u6761\u8f6c\u53d1\u3002\n",
"\n",
"\u9996\u5148\u662f\u7528\u4e32\u884c\u7684\u65b9\u5f0f\u83b7\u5f972000\u6761\u8f6c\u53d1\uff1a"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def get_post_reposts(mid):\n",
" reposts = []\n",
" total_number = get_repost_timeline(id=mid)['total_number']\n",
" page_number = total_number / 200 + 1\n",
" if page_number > 10:\n",
" page_number = 10\n",
" for i in range(1,page_number + 1):\n",
" reposts += get_repost_timeline(mid, page=i)['reposts']\n",
" return reposts"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 4
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"timeit get_post_reposts(3664072912104801)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"1 loops, best of 3: 10.2 s per loop\n"
]
}
],
"prompt_number": 24
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\u5982\u679c\u7528\u4e32\u884c\u7684\u65b9\u5f0f\uff0c\u6293\u53d610\u98752000\u6761\u8f6c\u53d1\u9700\u8981\u7684\u65f6\u95f4\u57fa\u672c\u4e0a\u662f\u6293\u53d6\u4e00\u9875\u8f6c\u53d1\u768410\u500d\uff0c\u6211\u4eec\u518d\u6765\u770b\u4e00\u4e0b\u7528\u591a\u8fdb\u7a0b\u7f16\u7a0b\u7684\u6548\u679c\u3002\u8fd9\u91cc\u9996\u5148\u4f7f\u7528\u5f02\u6b65\u7684apply\u65b9\u5f0f\u3002"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def get_post_reposts_para_async(mid, pool_num=None, pid=0):\n",
" reposts = []\n",
" results = []\n",
" pool = mul.Pool(pool_num)\n",
" total_number = get_repost_timeline(id=mid)['total_number']\n",
" page_number = total_number / 200 + 1\n",
" if page_number > 10:\n",
" page_number = 10\n",
" for i in range(1, page_number + 1):\n",
" results.append(pool.apply_async(get_repost_timeline, kwds=dict(id=mid, page=i, pid=pid)))\n",
" pool.close()\n",
" pool.join()\n",
" for result in results:\n",
" reposts += result.get()['reposts']\n",
" return reposts"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 6
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"reposts = get_post_reposts_para_async(3664072912104801, pid=1)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"pid 1687 start!\n",
"pid 1689 start!\n",
"pid 1688 start!\n",
"pid 1690 start!\n",
"pidpidpidpid 1687 finished!\n",
" 1689 finished!\n",
" 1688 finished!\n",
" 1690 finished!\n",
"pidpidpidpid 1687 start!\n",
" 1689 start!\n",
" 1688 start!\n",
" 1690 start!\n",
"pidpidpidpid 1687 finished!\n",
" 1689 finished!\n",
" 1688 finished!\n",
" 1690 finished!\n",
"pid 1687 start!\n",
"pid 1689 start!\n",
"pidpid 1687 finished!\n",
" 1689 finished!\n"
]
}
],
"prompt_number": 11
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\u6bcf\u4e2apid\u662f\u5e76\u884c\u8fdb\u884c\u7684\uff0c\u4e00\u4e2a\u8fdb\u7a0b\u5f00\u59cb\u540e\uff0c\u5176\u4ed6\u7684\u8fdb\u7a0b\u4e5f\u53ef\u4ee5\u540c\u6b65\u5f00\u59cb\u3002"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"timeit get_post_reposts_para_async(3664072912104801)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"1 loops, best of 3: 7.64 s per loop\n"
]
}
],
"prompt_number": 25
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"timeit get_post_reposts_para_async(3664072912104801, pool_num=10)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"1 loops, best of 3: 7.64 s per loop\n"
]
}
],
"prompt_number": 26
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"timeit get_post_reposts_para_async(3664072912104801, pool_num=1)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"1 loops, best of 3: 10.3 s per loop\n"
]
}
],
"prompt_number": 14
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"timeit get_post_reposts_para_async(3664072912104801, pool_num=100)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"1 loops, best of 3: 8.24 s per loop\n"
]
}
],
"prompt_number": 16
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\u4ece\u8fd0\u884c\u65f6\u95f4\u6765\u770b\uff0c\u4f3c\u4e4e\u662f\u968f\u7740\u8fdb\u7a0b\u6c60\u7684\u589e\u52a0\u800c\u51cf\u5c11\uff0c\u8fbe\u523010\u6b21\uff0c\u5373\u8fd9\u4e2a\u4efb\u52a1\u7684\u6bcf\u4e2a\u5faa\u73af\u4efb\u52a1\u90fd\u6709\u4e00\u4e2a\u5355\u72ec\u7684\u8fdb\u7a0b\u8fbe\u5230\u4e00\u4e2a\u5cf0\u503c\uff0c\u4e4b\u540e\u8fd0\u884c\u65f6\u95f4\u4f1a\u51cf\u5c11\u3002\u4f46\u662f\u603b\u4f53\u6765\u8bf4\uff0c\u5f02\u6b65\u7684\u591a\u8fdb\u7a0b\u7f16\u7a0b\u6bd4\u4e32\u884c\u7684\u65f6\u95f4\u8981\u5c11\u3002"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def get_post_reposts_para(mid, pool_num=None, pid=0):\n",
" reposts = []\n",
" pool = mul.Pool(pool_num)\n",
" total_number = get_repost_timeline(id=mid)['total_number']\n",
" page_number = total_number / 20 + 1\n",
" if page_number > 10:\n",
" page_number = 10\n",
" for i in range(1,page_number):\n",
" reposts += pool.apply(get_repost_timeline,kwds=dict(id=mid, page=i, pid=pid))['reposts']\n",
" pool.close()\n",
" pool.join()\n",
" return reposts"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 17
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"reposts = get_post_reposts_para(3664072912104801, pid=1)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"pid 2206 start!\n",
"pid 2207 start!\n",
"pid 2208 start!\n",
"pid 2209 start!\n",
"pidpidpidpid 2206 finished!\n",
" 2207 finished!\n",
" 2208 finished!\n",
" 2209 finished!\n",
"pidpidpidpid 2206 start!\n",
" 2207 start!\n",
" 2208 start!\n",
" 2209 start!\n",
"pidpidpidpid 2206 finished!\n",
" 2207 finished!\n",
" 2208 finished!\n",
" 2209 finished!\n",
"pid 2206 start!\n",
"pid 2206 finished!\n"
]
}
],
"prompt_number": 21
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"timeit get_post_reposts_para(3664072912104801)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"1 loops, best of 3: 9.04 s per loop\n"
]
}
],
"prompt_number": 19
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"timeit get_post_reposts_para(3664072912104801,pool_num=10)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"1 loops, best of 3: 9.87 s per loop\n"
]
}
],
"prompt_number": 22
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"timeit get_post_reposts_para(3664072912104801,pool_num=1)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"1 loops, best of 3: 9.71 s per loop\n"
]
}
],
"prompt_number": 23
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\u7531\u4e8e\u662f\u975e\u5f02\u6b65\u7684\u5f62\u5f0f\uff0c\u6bcf\u4e2a\u8fdb\u7a0b\u7684\u4efb\u52a1\u5b8c\u6210\u540e\u624d\u4f1a\u542f\u52a8\u65b0\u7684\u8fdb\u7a0b\uff0c\u4e0d\u8fc7\u4ece\u8fd0\u884c\u65f6\u95f4\u4e0a\u6765\u770b\uff0c\u8fd8\u662f\u8981\u6bd4\u4e32\u884c\u7684\u5f62\u5f0f\u65f6\u95f4\u8981\u77ed\u3002"
]
}
],
"metadata": {}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment