Skip to content

Instantly share code, notes, and snippets.

@p
Created March 8, 2012 14:02
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save p/7d6a1d57e1e895b2c342 to your computer and use it in GitHub Desktop.
Save p/7d6a1d57e1e895b2c342 to your computer and use it in GitHub Desktop.
(07:56:51) nw-: search index building takes forever
(07:56:57) nw-: why can we only manage 10 posts per second?
(07:58:57) bantu: nw-: I have no idea
(07:59:53) nw-: so my 10k test board
(08:00:01) nw-: takes under a minute to create?
(08:00:03) bantu: nw-: indexing 1mio board already took 4 days ;-)
(08:00:10) nw-: nginx times it out after 30 seconds
(08:00:15) nw-: maybe 1.5 minutes max
(08:00:21) nw-: takes 16 minutes to index
(08:01:05) nw-: those array ops are not looking very fast
(08:01:15) bantu: nw-: there is a cli script for creating index
(08:01:41) nw-: as in it does not attempt to process each post incrementally?
(08:01:50) nw-: why is the acp version doing this then?
(08:02:12) nw-: or will the script not work correctly on a live board
(08:02:31) nw-: i guess i don't care for the index contents
(08:04:40) paul999: nw-: not everyone can run it via cli? :)
(08:04:52) bantu: nw-: the problem is nginx in this case
(08:04:57) bantu: php does not time out
(08:05:08) bantu: but the cgi connection does
(08:05:17) bantu: we handle php timeout fine
(08:05:23) bantu: which is why it works on apache etc.
(08:05:32) nw-: no that's irrelevant
(08:05:38) nw-: php continues to work after nginx gives up on it
(08:05:48) nw-: the problem is the damn index takes 15 minutes to build
(08:06:34) bantu: admin index generation however sends back and forth requests, so it is bad if it times out
(08:06:57) bantu: I actually thought index generation is dominated by the http overhead, making it so slow
(08:07:03) nw-: i'm timing the cli version
(08:07:12) bantu: which is why I wrote the cli version
(08:07:15) nw-: index generation pegs cpu in php
(08:07:23) bantu: but it still is slow as hell
(08:07:50) nw-: the cli version has php at 89%, postgres at 8% and healthy amount of disk activity according to the led
(08:08:22) nw-: 1.9 mb/sec according to iostat
(08:08:41) nw-: my entire database is 30 megs?
(08:09:32) nw-: did you use transactions in the cli version? :)
(08:09:48) bantu: no
(08:09:59) bantu: but transaction might be used in the search code
(08:10:01) nw-: 12 posts/sec, it reports
(08:10:06) bantu: my cli process has 0% cpu and is not io bound
(08:10:17) bantu: and is running for 4 days already
(08:10:25) nw-: that's not very good
(08:10:35) nw-: if it's not cpu bound and not io bound what is it doing?
(08:11:09) nw-: ah. cli version just calls index on each post
(08:11:17) nw-: which i'm guessing is exactly what the web version does
(08:11:51) nw-: and color me silly but why is begin under an if of !sizeof($unique_add_words) ?
(08:14:14) nw-: ah nevermind
(08:14:18) nw-: it begins in the other branch too
(08:16:10) nw-: i think the post/sec divides over total time
(08:16:17) nw-: it went from 12 to 6 to 4
(08:17:27) nw-: yep
(08:18:07) bantu: yup
(08:18:33) nw-: i can fix it unless you want to fix it
(08:18:51) bantu: go ahead
(08:19:22) nw-: 10 transactions/second on a 4200 rpm disk is probably not shocking
(08:20:29) nw-: [phpBB Debug] PHP Notice: in file [ROOT]/includes/search/fulltext_mysql.php on line 858: Undefined index: total_posts
(08:20:30) nw-: hmm
(08:22:14) nw-: this is a fresh install
(08:22:38) bantu: I see
(08:22:55) nw-: that empty array assignment does not actually work very well
(08:23:15) nw-: either array cannot be left empty or clients need to isset it
(08:23:33) bantu: no, that's not the problem
(08:23:42) nw-: hmm?
(08:23:46) nw-: this is a postgres install
(08:23:46) bantu: or maybe it is
(08:23:57) bantu: let me check this properly
(08:24:02) nw-: and it looks like acp always displays mysql fulltext index stats
(08:24:08) nw-: on postgres those stats are not built
(08:24:26) nw-: if (strpos($db->sql_layer, 'mysql') === false)
(08:24:26) nw-: will be taken
(08:24:32) nw-: stats is an empty array
(08:24:40) nw-: then line 858 unconditionally indexes it
(08:30:14) nw-: ok so
(08:30:37) nw-: you say: $this->stats['total_posts'] = empty($this->stats) ? 0 : $db->get_estimated_row_count(POSTS_TABLE);
(08:30:54) nw-: if index stats could not be retrieved you still return total posts as 0
(08:31:07) nw-: and you write them into stats
(08:31:33) nw-: following this logic $this->stats = array(); earlier in that function should be changed to have a total_posts => 0
(08:33:31) bantu: instead of empty($this->stats) I want to use index_stats()
(08:33:37) bantu: but that ends up in an endless loop
(08:34:16) nw-: filling total posts will have the side effect of stats not being empty
(08:34:37) bantu: yeah
(08:34:57) nw-: say, wouldn't it be convenient if get_stats had a private declaration on it (via docblock for php4)?
(08:35:39) nw-: if get stats leaves stats empty it will be called again and again
(08:35:47) nw-: so by that reasoning it should leave stats not empty
(08:36:14) nw-: for the strpos mysql check the difference is obviously negligible
(08:37:03) nw-: the only reason to leave stats empty is if someone calls get_stats to see if we have an index of the appropriate type
(08:37:59) nw-: honestly it looks like get_stats should be private and index_stats is the intended public entry point
(08:38:43) nw-: delete_index does issets everywhere
(08:39:39) bantu: nw-: I've updated the branch
(08:39:55) bantu: the notice should no longer show
(08:40:02) bantu: Fixed a commit message as well, so rebased
(08:40:17) nw-: what do you think of making get_stats private in 3.1?
(08:40:19) nw-: or protected
(08:40:30) rxu_laptop: bantu : possible approach to 10684 fix https://github.com/rxu/phpbb3/compare/develop-olympus...ticket%2F10684
(08:42:35) bantu: nw-: probably a good idea, but http://area51.phpbb.com/phpBB/viewtopic.php?f=84&t=33441
(08:43:12) nw-: that is very open ended
(08:43:29) nw-: what was your point?
(08:43:52) bantu: my point it that it is shitty code anyway and there are soo many things to improve
(08:44:05) nn-: well
(08:44:09) nn-: private is 1 line diff
(08:44:37) APTX left the room (quit: Ping timeout: 245 seconds).
(08:44:51) nn-: http://area51.phpbb.com/phpBB/viewtopic.php?f=84&t=33441&p=236111#p236111 posted
(08:45:47) bantu: :-)
(08:45:48) nn-: i can't see how the diff no longer produces that notice
(08:45:57) nn-: what did you do?
(08:47:32) bantu: I originally changed $user->lang['FULLTEXT_MYSQL_TOTAL_POSTS'] => ($this->index_created()) ? $this->stats['total_posts'] : 0,
(08:47:38) bantu: to $user->lang['FULLTEXT_MYSQL_TOTAL_POSTS'] => $this->stats['total_posts']
(08:48:06) bantu: I've undone that and replaced it with https://github.com/bantu/phpbb3/commit/01d90e59a8c311778a832ba4920e4c5a85b1256f
(08:48:53) bantu: hmm, can't find the original commit right now
(08:49:10) nw-: i should have it in gitk somewhere
(08:49:30) nw-: ed4b867462bd6200e51643e77e001ccce9be9688
(08:49:47) nw-: so you reverted the first hunk of that
(08:49:47) bantu: yeah
(08:49:58) bantu: I replaced https://github.com/bantu/phpbb3/commit/ed4b867462bd6200e51643e77e001ccce9be9688 with https://github.com/bantu/phpbb3/commit/01d90e59a8c311778a832ba4920e4c5a85b1256f
(08:50:19) nw-: yep that should do it
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment