Skip to content

Instantly share code, notes, and snippets.


p/gist:7d6a1d57e1e895b2c342 Secret

Created Mar 8, 2012
What would you like to do?
(07:56:51) nw-: search index building takes forever
(07:56:57) nw-: why can we only manage 10 posts per second?
(07:58:57) bantu: nw-: I have no idea
(07:59:53) nw-: so my 10k test board
(08:00:01) nw-: takes under a minute to create?
(08:00:03) bantu: nw-: indexing 1mio board already took 4 days ;-)
(08:00:10) nw-: nginx times it out after 30 seconds
(08:00:15) nw-: maybe 1.5 minutes max
(08:00:21) nw-: takes 16 minutes to index
(08:01:05) nw-: those array ops are not looking very fast
(08:01:15) bantu: nw-: there is a cli script for creating index
(08:01:41) nw-: as in it does not attempt to process each post incrementally?
(08:01:50) nw-: why is the acp version doing this then?
(08:02:12) nw-: or will the script not work correctly on a live board
(08:02:31) nw-: i guess i don't care for the index contents
(08:04:40) paul999: nw-: not everyone can run it via cli? :)
(08:04:52) bantu: nw-: the problem is nginx in this case
(08:04:57) bantu: php does not time out
(08:05:08) bantu: but the cgi connection does
(08:05:17) bantu: we handle php timeout fine
(08:05:23) bantu: which is why it works on apache etc.
(08:05:32) nw-: no that's irrelevant
(08:05:38) nw-: php continues to work after nginx gives up on it
(08:05:48) nw-: the problem is the damn index takes 15 minutes to build
(08:06:34) bantu: admin index generation however sends back and forth requests, so it is bad if it times out
(08:06:57) bantu: I actually thought index generation is dominated by the http overhead, making it so slow
(08:07:03) nw-: i'm timing the cli version
(08:07:12) bantu: which is why I wrote the cli version
(08:07:15) nw-: index generation pegs cpu in php
(08:07:23) bantu: but it still is slow as hell
(08:07:50) nw-: the cli version has php at 89%, postgres at 8% and healthy amount of disk activity according to the led
(08:08:22) nw-: 1.9 mb/sec according to iostat
(08:08:41) nw-: my entire database is 30 megs?
(08:09:32) nw-: did you use transactions in the cli version? :)
(08:09:48) bantu: no
(08:09:59) bantu: but transaction might be used in the search code
(08:10:01) nw-: 12 posts/sec, it reports
(08:10:06) bantu: my cli process has 0% cpu and is not io bound
(08:10:17) bantu: and is running for 4 days already
(08:10:25) nw-: that's not very good
(08:10:35) nw-: if it's not cpu bound and not io bound what is it doing?
(08:11:09) nw-: ah. cli version just calls index on each post
(08:11:17) nw-: which i'm guessing is exactly what the web version does
(08:11:51) nw-: and color me silly but why is begin under an if of !sizeof($unique_add_words) ?
(08:14:14) nw-: ah nevermind
(08:14:18) nw-: it begins in the other branch too
(08:16:10) nw-: i think the post/sec divides over total time
(08:16:17) nw-: it went from 12 to 6 to 4
(08:17:27) nw-: yep
(08:18:07) bantu: yup
(08:18:33) nw-: i can fix it unless you want to fix it
(08:18:51) bantu: go ahead
(08:19:22) nw-: 10 transactions/second on a 4200 rpm disk is probably not shocking
(08:20:29) nw-: [phpBB Debug] PHP Notice: in file [ROOT]/includes/search/fulltext_mysql.php on line 858: Undefined index: total_posts
(08:20:30) nw-: hmm
(08:22:14) nw-: this is a fresh install
(08:22:38) bantu: I see
(08:22:55) nw-: that empty array assignment does not actually work very well
(08:23:15) nw-: either array cannot be left empty or clients need to isset it
(08:23:33) bantu: no, that's not the problem
(08:23:42) nw-: hmm?
(08:23:46) nw-: this is a postgres install
(08:23:46) bantu: or maybe it is
(08:23:57) bantu: let me check this properly
(08:24:02) nw-: and it looks like acp always displays mysql fulltext index stats
(08:24:08) nw-: on postgres those stats are not built
(08:24:26) nw-: if (strpos($db->sql_layer, 'mysql') === false)
(08:24:26) nw-: will be taken
(08:24:32) nw-: stats is an empty array
(08:24:40) nw-: then line 858 unconditionally indexes it
(08:30:14) nw-: ok so
(08:30:37) nw-: you say: $this->stats['total_posts'] = empty($this->stats) ? 0 : $db->get_estimated_row_count(POSTS_TABLE);
(08:30:54) nw-: if index stats could not be retrieved you still return total posts as 0
(08:31:07) nw-: and you write them into stats
(08:31:33) nw-: following this logic $this->stats = array(); earlier in that function should be changed to have a total_posts => 0
(08:33:31) bantu: instead of empty($this->stats) I want to use index_stats()
(08:33:37) bantu: but that ends up in an endless loop
(08:34:16) nw-: filling total posts will have the side effect of stats not being empty
(08:34:37) bantu: yeah
(08:34:57) nw-: say, wouldn't it be convenient if get_stats had a private declaration on it (via docblock for php4)?
(08:35:39) nw-: if get stats leaves stats empty it will be called again and again
(08:35:47) nw-: so by that reasoning it should leave stats not empty
(08:36:14) nw-: for the strpos mysql check the difference is obviously negligible
(08:37:03) nw-: the only reason to leave stats empty is if someone calls get_stats to see if we have an index of the appropriate type
(08:37:59) nw-: honestly it looks like get_stats should be private and index_stats is the intended public entry point
(08:38:43) nw-: delete_index does issets everywhere
(08:39:39) bantu: nw-: I've updated the branch
(08:39:55) bantu: the notice should no longer show
(08:40:02) bantu: Fixed a commit message as well, so rebased
(08:40:17) nw-: what do you think of making get_stats private in 3.1?
(08:40:19) nw-: or protected
(08:40:30) rxu_laptop: bantu : possible approach to 10684 fix
(08:42:35) bantu: nw-: probably a good idea, but
(08:43:12) nw-: that is very open ended
(08:43:29) nw-: what was your point?
(08:43:52) bantu: my point it that it is shitty code anyway and there are soo many things to improve
(08:44:05) nn-: well
(08:44:09) nn-: private is 1 line diff
(08:44:37) APTX left the room (quit: Ping timeout: 245 seconds).
(08:44:51) nn-: posted
(08:45:47) bantu: :-)
(08:45:48) nn-: i can't see how the diff no longer produces that notice
(08:45:57) nn-: what did you do?
(08:47:32) bantu: I originally changed $user->lang['FULLTEXT_MYSQL_TOTAL_POSTS'] => ($this->index_created()) ? $this->stats['total_posts'] : 0,
(08:47:38) bantu: to $user->lang['FULLTEXT_MYSQL_TOTAL_POSTS'] => $this->stats['total_posts']
(08:48:06) bantu: I've undone that and replaced it with
(08:48:53) bantu: hmm, can't find the original commit right now
(08:49:10) nw-: i should have it in gitk somewhere
(08:49:30) nw-: ed4b867462bd6200e51643e77e001ccce9be9688
(08:49:47) nw-: so you reverted the first hunk of that
(08:49:47) bantu: yeah
(08:49:58) bantu: I replaced with
(08:50:19) nw-: yep that should do it
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.