Skip to content

Instantly share code, notes, and snippets.

@MOON-CLJ
Last active December 15, 2015 06:09
Show Gist options
  • Save MOON-CLJ/5213953 to your computer and use it in GitHub Desktop.
Save MOON-CLJ/5213953 to your computer and use it in GitHub Desktop.
text xapian_weibo index performance
index_text用法 replace
[2013-03-21 22:44:49] folder[_hehe_2011-08-21] num indexed: 350000
[2013-03-21 22:44:57] folder[_hehe_2010-12-14] num indexed: 360000
[2013-03-21 22:45:05] folder[_hehe_2010-12-14] num indexed: 370000
[2013-03-21 22:45:11] folder[_hehe_2010-04-08] num indexed: 380000
[2013-03-21 22:45:17] folder[_hehe_2010-10-25] num indexed: 390000
[2013-03-21 22:45:25] folder[_hehe_2011-07-02] num indexed: 400000
[2013-03-21 22:45:34] folder[_hehe_2011-10-10] num indexed: 410000
[2013-03-21 22:45:43] folder[_hehe_2011-02-02] num indexed: 420000
[2013-03-21 22:45:51] folder[_hehe_2011-02-02] num indexed: 430000
[2013-03-21 22:45:59] folder[_hehe_2010-10-25] num indexed: 440000
[2013-03-21 22:46:07] folder[_hehe_2010-07-17] num indexed: 450000
[2013-03-21 22:46:15] folder[_hehe_2010-02-17] num indexed: 460000
[2013-03-21 22:46:24] folder[_hehe_2009-09-20] num indexed: 470000
[2013-03-21 22:46:30] folder[_hehe_2011-07-02] num indexed: 480000
[2013-03-21 22:46:35] folder[_hehe_2010-07-17] num indexed: 490000
[2013-03-21 22:46:43] folder[_hehe_2011-05-13] num indexed: 500000
'index_weibos' 408.67 sec
index_text用法 add
[2013-03-21 23:03:09] folder[_hehe_2011-08-21] num indexed: 350000
[2013-03-21 23:03:17] folder[_hehe_2010-12-14] num indexed: 360000
[2013-03-21 23:03:25] folder[_hehe_2010-12-14] num indexed: 370000
[2013-03-21 23:03:30] folder[_hehe_2010-04-08] num indexed: 380000
[2013-03-21 23:03:36] folder[_hehe_2010-10-25] num indexed: 390000
[2013-03-21 23:03:44] folder[_hehe_2011-07-02] num indexed: 400000
[2013-03-21 23:03:52] folder[_hehe_2011-10-10] num indexed: 410000
[2013-03-21 23:04:01] folder[_hehe_2011-02-02] num indexed: 420000
[2013-03-21 23:04:08] folder[_hehe_2011-02-02] num indexed: 430000
[2013-03-21 23:04:16] folder[_hehe_2010-10-25] num indexed: 440000
[2013-03-21 23:04:24] folder[_hehe_2010-07-17] num indexed: 450000
[2013-03-21 23:04:32] folder[_hehe_2010-02-17] num indexed: 460000
[2013-03-21 23:04:41] folder[_hehe_2009-09-20] num indexed: 470000
[2013-03-21 23:04:46] folder[_hehe_2011-07-02] num indexed: 480000
[2013-03-21 23:04:53] folder[_hehe_2010-07-17] num indexed: 490000
[2013-03-21 23:05:00] folder[_hehe_2011-05-13] num indexed: 500000
'index_weibos' 400.86 sec
即replace和add在此50万的场景下,差别不大.
add_term
[2013-03-21 23:13:29] folder[_hehe_2011-08-21] num indexed: 350000
[2013-03-21 23:13:37] folder[_hehe_2010-12-14] num indexed: 360000
[2013-03-21 23:13:45] folder[_hehe_2010-12-14] num indexed: 370000
[2013-03-21 23:13:51] folder[_hehe_2010-04-08] num indexed: 380000
[2013-03-21 23:13:58] folder[_hehe_2010-10-25] num indexed: 390000
[2013-03-21 23:14:05] folder[_hehe_2011-07-02] num indexed: 400000
[2013-03-21 23:14:14] folder[_hehe_2011-10-10] num indexed: 410000
[2013-03-21 23:14:23] folder[_hehe_2011-02-02] num indexed: 420000
[2013-03-21 23:14:31] folder[_hehe_2011-02-02] num indexed: 430000
[2013-03-21 23:14:40] folder[_hehe_2010-10-25] num indexed: 440000
[2013-03-21 23:14:48] folder[_hehe_2010-07-17] num indexed: 450000
[2013-03-21 23:14:57] folder[_hehe_2010-02-17] num indexed: 460000
[2013-03-21 23:15:06] folder[_hehe_2009-09-20] num indexed: 470000
[2013-03-21 23:15:12] folder[_hehe_2011-07-02] num indexed: 480000
[2013-03-21 23:15:17] folder[_hehe_2010-07-17] num indexed: 490000
[2013-03-21 23:15:25] folder[_hehe_2011-05-13] num indexed: 500000
'index_weibos' 424.19 sec
remove single_word_whitelist过滤
[2013-03-21 23:26:31] folder[_hehe_2011-08-21] num indexed: 350000
[2013-03-21 23:26:40] folder[_hehe_2010-12-14] num indexed: 360000
[2013-03-21 23:26:48] folder[_hehe_2010-12-14] num indexed: 370000
[2013-03-21 23:26:55] folder[_hehe_2010-04-08] num indexed: 380000
[2013-03-21 23:27:01] folder[_hehe_2010-10-25] num indexed: 390000
[2013-03-21 23:27:10] folder[_hehe_2011-07-02] num indexed: 400000
[2013-03-21 23:27:22] folder[_hehe_2011-10-10] num indexed: 410000
[2013-03-21 23:27:38] folder[_hehe_2011-02-02] num indexed: 420000
[2013-03-21 23:27:47] folder[_hehe_2011-02-02] num indexed: 430000
[2013-03-21 23:27:55] folder[_hehe_2010-10-25] num indexed: 440000
[2013-03-21 23:28:03] folder[_hehe_2010-07-17] num indexed: 450000
[2013-03-21 23:28:13] folder[_hehe_2010-02-17] num indexed: 460000
[2013-03-21 23:28:22] folder[_hehe_2009-09-20] num indexed: 470000
[2013-03-21 23:28:28] folder[_hehe_2011-07-02] num indexed: 480000
[2013-03-21 23:28:34] folder[_hehe_2010-07-17] num indexed: 490000
[2013-03-21 23:28:41] folder[_hehe_2011-05-13] num indexed: 500000
'index_weibos' 446.62 sec
反倒慢了,主要是因为单字的垃圾信息太多
json.dumps版本
[2013-03-22 21:41:49] folder[_hehe_2011-08-21] num indexed: 350000
[2013-03-22 21:41:59] folder[_hehe_2010-12-14] num indexed: 360000
[2013-03-22 21:42:08] folder[_hehe_2010-12-14] num indexed: 370000
[2013-03-22 21:42:13] folder[_hehe_2010-04-08] num indexed: 380000
[2013-03-22 21:42:19] folder[_hehe_2010-10-25] num indexed: 390000
[2013-03-22 21:42:27] folder[_hehe_2011-07-02] num indexed: 400000
[2013-03-22 21:42:35] folder[_hehe_2011-10-10] num indexed: 410000
[2013-03-22 21:42:43] folder[_hehe_2011-02-02] num indexed: 420000
[2013-03-22 21:42:51] folder[_hehe_2011-02-02] num indexed: 430000
[2013-03-22 21:42:58] folder[_hehe_2010-10-25] num indexed: 440000
[2013-03-22 21:43:06] folder[_hehe_2010-07-17] num indexed: 450000
[2013-03-22 21:43:16] folder[_hehe_2010-02-17] num indexed: 460000
[2013-03-22 21:43:26] folder[_hehe_2009-09-20] num indexed: 470000
[2013-03-22 21:43:31] folder[_hehe_2011-07-02] num indexed: 480000
[2013-03-22 21:43:37] folder[_hehe_2010-07-17] num indexed: 490000
[2013-03-22 21:43:44] folder[_hehe_2011-05-13] num indexed: 500000
'index_weibos' 403.72 sec
dumps_exclude 去掉多余的字段
[2013-03-22 22:15:47] folder[_hehe_2011-08-21] num indexed: 350000
[2013-03-22 22:15:55] folder[_hehe_2010-12-14] num indexed: 360000
[2013-03-22 22:16:03] folder[_hehe_2010-12-14] num indexed: 370000
[2013-03-22 22:16:10] folder[_hehe_2010-04-08] num indexed: 380000
[2013-03-22 22:16:16] folder[_hehe_2010-10-25] num indexed: 390000
[2013-03-22 22:16:24] folder[_hehe_2011-07-02] num indexed: 400000
[2013-03-22 22:16:32] folder[_hehe_2011-10-10] num indexed: 410000
[2013-03-22 22:16:41] folder[_hehe_2011-02-02] num indexed: 420000
[2013-03-22 22:16:49] folder[_hehe_2011-02-02] num indexed: 430000
[2013-03-22 22:16:56] folder[_hehe_2010-10-25] num indexed: 440000
[2013-03-22 22:17:04] folder[_hehe_2010-07-17] num indexed: 450000
[2013-03-22 22:17:12] folder[_hehe_2010-02-17] num indexed: 460000
[2013-03-22 22:17:19] folder[_hehe_2009-09-20] num indexed: 470000
[2013-03-22 22:17:25] folder[_hehe_2011-07-02] num indexed: 480000
[2013-03-22 22:17:29] folder[_hehe_2010-07-17] num indexed: 490000
[2013-03-22 22:17:36] folder[_hehe_2011-05-13] num indexed: 500000
'index_weibos' 375.16 sec
@MOON-CLJ
Copy link
Author

开两个脚本一起跑
'index_weibos' 423.49 sec
'index_weibos' 432.44 sec
可以看出来,一个脚本跑并没有达到io瓶颈

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment