Created
April 4, 2011 05:14
-
-
Save thesjg/901160 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
[21:27] <@dillon> the current implementation is pretty bad, though it is using VOP_READ so there should be read-ahead by virtue of it going through the filesystem | |
[21:27] <@vsrinivas> also its mplocked. | |
[21:28] <@vsrinivas> actually looks not; just around the fd work. | |
[21:28] <@thesjg> so feel free to drop all your insight on me and i'll make note of it | |
[21:28] <@thesjg> but i'm not prepared to do it quite yet | |
[21:28] <@dillon> it shouldn't need that mplock since its holding the fp | |
[21:29] <@vsrinivas> ummm -- in the core loop it ssb_lock()s; | |
[21:29] <@vsrinivas> tcp output needs the ssb lock also iirc to read from that queue. | |
[21:29] <@dillon> w/ regards to the rest I think it can pass a larger value in the ioflags it passes to VOP_READ ... figure out what the maximum sequential heuristic value is there and pass that | |
[21:29] <@dillon> (in the upper 16 bits. See line 1694) | |
[21:30] <@vsrinivas> so sendfile will copy all the 4k pages into the socket to send, but the socket can't go until ssb_lock is released? | |
[21:30] <@dillon> right now it looks like it is passing a fairly small value and not reading ahead much | |
[21:31] <@dillon> jeeze is it holding the sockbuf locked through the VOP_READ ? | |
[21:31] <@vsrinivas> dillon: is that accurate (ssb_lock excludes tcp output path?) | |
[21:31] <@vsrinivas> yes. | |
[21:31] <@dillon> it certainly does not need to do that | |
[21:31] <@dillon> yah, that will mess up the tcp stack | |
[21:32] <@dillon> it can definitely release the lock around the VOP_READ, and probably other places too | |
[21:32] <@vsrinivas> also -- its currently holding the vm token -- but iirc vm_page_lookup(obj ...) needs to hold the vmobj token iirc? | |
[21:32] <@dillon> I'm not sure it needs the ssb_lock at all, why does it get it ? | |
[21:32] <@dillon> its holding it around so_pru_send | |
[21:32] <@vsrinivas> to quote the comment: ' * Protect against multiple writers to the socket. | |
[21:32] <@vsrinivas> ' | |
[21:33] <@thesjg> um | |
[21:33] <@dillon> ah | |
[21:33] <@dillon> it won't mess up the protocol side then | |
[21:33] <@dillon> sends are messaged so the userland side doesn't queue to the sockbuf directly | |
[21:34] <@vsrinivas> wha? | |
[21:34] <@vsrinivas> } else if (vm_page_sleep_busy(pg, TRUE, "sfpbsy")) { | |
[21:34] <@vsrinivas> lwkt_reltoken(&vm_token); | |
[21:34] <@dillon> still it probably doesn't need the lock, there is no expectation of atomicy if multiple threads are running writes to the same socket via sendfile() | |
[21:34] <@vsrinivas> that's a sad bit of code. sleep; wakeup to get the vm token. release the token. | |
[21:34] <@dillon> sendfile is manually manipulating VM pages. VM pages that are in a PG_BUSY or m->busy state are in flux | |
[21:34] <@dillon> that case is going to be fairly rare | |
[21:35] <@dillon> if it catches up to the read-ahead it will hit soft-busied pages whos reads are in progress from prior VOP_READ readaheads | |
[21:36] <@dillon> it should be doing read-ahead ok, even though the implementation is hold, just fix the parameter sent to VOP_READ() | |
[21:36] <@dillon> you can test by sendfile'ing a file from a freshly mounted (uncached) filesystem and seeing if it maxes the disk's sequential throughput out | |
[21:36] <@thesjg> ok, sounds easy enough | |
[21:38] <@vsrinivas> okay; to do this sorta better.... would it make sense to start but building sfbufs for a ton of pages first.... then interleave piping sfbufs+mbufs into the so_send path (returning to sfbufing when the sosend path is draining well? | |
[21:38] <@vsrinivas> something like double-buffering between the two? | |
[21:39] <@dillon> no, the VOP_READ it issues is definitely correct, you want to do that because the filesystem will implement the read-ahead for you | |
... | |
[21:56] <@dillon> hmm. you know, it occurs to me that a lot of read-ahead wouldn't be beneficial anyway, since it is going to keep the sockbuf full | |
[21:57] <@thesjg> it doesn't keep the sockbuf full is the problem | |
[21:57] <@thesjg> once you get 50-60 connections being sendfile(2)'d to, each one a different file, the disk is so busy doing seeks the sockbuf's never fill | |
[21:57] <@thesjg> if you are doing more readahead you lessen the # of seeks the disk has to bear | |
[21:58] <@dillon> hmm. That would be a hysteresis issue in cluster_read() I would guess | |
[21:58] <@dillon> something to definitely look into. Even though cluster_read() does read-ahead it can wind up nickel-and-dimeing the disk | |
[21:58] <@thesjg> well, i haven't tested it on dragonfly but at least that's the issue i have hit time and again for years and years on freebsd | |
[21:58] <@dillon> e.g. if you read at offset 0 it reads, say, 128KB | |
[21:59] <@dillon> when you read() at offset 4096 it reads ahead to the 132K mark, reading just 4K more | |
[21:59] <@dillon> that could cause the disk to thrash | |
[21:59] <@thesjg> ah, sure, i see | |
[21:59] <@dillon> there is a read-ahead-mark that's supposed to deal with this B_RAM | |
[22:00] <@dillon> you could add some debugging to print out the offsets B_RAM is being set at to make sure the read-ahead is working on properly sized chunks | |
[22:00] <@dillon> i.e. B_RAM should be set about half way through the read-ahead if the buffer cache buffer is not B_CACHE | |
[22:01] <@dillon> if the buffer cache buffer is cached (B_CACHE) then the next read-ahead is triggered when a B_RAM mark is found, setting the next B_RAM mark ahead another X KB. | |
[22:01] <@vsrinivas> (B_CACHE is set when? also is there a bflag for swapcache reads?) | |
[22:01] <@vsrinivas> ((anyone got a chance to look at the mpipe ctor path? | |
[22:01] <@dillon> B_CACHE means a buffer cache buffer has valid data | |
[22:01] <@vsrinivas> otherwise, i'll commit :D)) | |
[22:01] <@vsrinivas> okay. | |
[22:01] <@dillon> in anycase, worth testing to make sure it is working properly. | |
[22:02] <@vsrinivas> ((yep. already run a vkernel with it)) | |
[22:02] <@dillon> Lots of parallel sendfile()'s will thrash the disk but still should get good throughput if the read-ahead is sufficient | |
[22:02] <@thesjg> once i get another piece of hardware here running DF some of this testing will get easier | |
[22:02] <@vsrinivas> the ssb around it should definitely be fixed though. | |
[22:02] <@vsrinivas> *ssb lock |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment