bcantrill/gist:8550468

## gistfile1.txt
MIME-Version: 1.0
Sender: bryancantrill@gmail.com
Received: by 10.68.33.97 with HTTP; Sat, 21 Dec 2013 00:11:05 -0800 (PST)
In-Reply-To: <CA+0vZFK2i8aq5DUQpt7wdk=QgodKOOEFgU7vz6n1ti0di-r4hw@mail.gmail.com>
References: <CANp9fE8t68+5_oVc=k=7oQPrSCNH5pnLZ2oP9x3YG8g4a9063A@mail.gmail.com>
	<CA+0vZFLRGYwysd8kO+KnazBG=vupfORTD+h1yj_dWur2f7LpPQ@mail.gmail.com>
	<CA+0vZFLpu5HFhJZLmpm01WOyzHjnaeHrKcuU7YEpMr6hEvyiRA@mail.gmail.com>
	<CAAm8y+jXcQ1ix5neUusW6NHXg+mb0V_ZNxrudjTaETdxhUQzKQ@mail.gmail.com>
	<F811D3C5-531A-492A-B390-5086F206DD6C@cheney.net>
	<CA+0vZFK2i8aq5DUQpt7wdk=QgodKOOEFgU7vz6n1ti0di-r4hw@mail.gmail.com>
Date: Sat, 21 Dec 2013 00:11:05 -0800
Delivered-To: bryancantrill@gmail.com
X-Google-Sender-Auth: 1-P9HY9dQxa-Mr_9AY_InMAQXGE
Message-ID: <CAAm8y+hbBWjw10E6CVjt+ROL-sOGeS1pnahppRVUoGfQHX2snA@mail.gmail.com>
Subject: Re: select and port problems on solaris port
From: Bryan Cantrill <bryan@joyent.com>
To: =?UTF-8?B?QXJhbSBIxIN2xINybmVhbnU=?= <aram@mgk.ro>
Cc: Dave Cheney <dave@cheney.net>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Hey Aram,

Thanks (as always) for your thoughts and guidance.  I thought about
this this week, and it seems that one major advantage we have is that
we are actually interposing on all system calls -- we know when the
runtime gets EAGAIN/EWOULDBLOCK/EINPROGRESS from a system call and can
use that to then (and only then) associate the event with the port.
This allows us to effectively replicate edge-triggered events on top
of level-triggered ones.  While it still needs some polish (e.g., to
dynamically size the PortPollDesc array), here's a patch that both (1)
passes all tests and (2) doesn't spin on Dave's test:

  http://us-east.manta.joyent.com/bcantrill/public/go/edge-triggered-port.diff.txt

Still needs some polish (e.g., I need to update the Perl script that
generates zsyscall_solaris_amd64.go), but I think that this route
seems promising.  What do you think?

        - Bryan


On Mon, Dec 16, 2013 at 7:32 AM, Aram H=C4=83v=C4=83rneanu <aram@mgk.ro> wr=
ote:
> Oh, how I hate all this stuff in the name of performance.
>
> Edge-triggering is a hack done by people obsessed with reducing
> syscall count. Level triggering is so simple to understand and use.
> You block for something, and once you get it you can do whatever you
> want with it. You can process it synchronously, and then you don't
> have to do anything special, or you can process it asynchronously and
> then the only thing you have to do, the only thing, is to remember to
> add the fd back to the list of interesting fd's once you are done.
> This requirement is trivial, and in fact the explicitness of it is
> quite useful in understanding the code. You can read one byte, a
> hundred. Doesn't matter; you can do whatever you want.
>
> With edge-triggered events it's not like this. At first they seem
> useful because you don't have to remember to add fd's back, but that'a
> a trivial requirement. The downsides are that it's spectacularly easy
> to deadlock, you have to process it asynchronously, and you have to
> process every single byte out of it even if the semantics of the
> application doesn't require it. The kernel does the bookkeeping for
> us, but the kernel doesn't know what we want from it, so the
> programmer has to adapt to a stricter and harder to use programming
> model.
>
> But we're optimised, so making programming harder is okay.
>
> I guess what we have to do is find a way to tell the poller every time
> when we are actually interested in a fd. Now we tell it once we create
> a socket, but we need to tell it after we process any kind of I/O.
> Then we can postpone shoehorning back fd's after we get events until
> we actually care about more events. Unfortunately, this seems very
> hard.
>
> A hack would be to somehow consume the connection-ready event in a
> safe manner. I think if we solve this, the problem goes away (maybe we
> need to do something about errors too). This would be very easy if the
> poller would simple offer hints to the scheduler. Of course, in the
> name of performance, the poller doesn't simply offer hints, it's vital
> to the scheduler function.
>
> I think the easiest way to implement that hack is to special-case
> WaitWrite for Connect (or rather, not use WaitWrite, but WaitConnect
> in Connect). On non-Solaris systems WaitConnect can simply be
> WaitWrite. On Solaris won't (eventually) call runtime=C2=B7netpoll, but i=
t
> will block in port_get.
>
> --
> Aram H=C4=83v=C4=83rneanu
	MIME-Version: 1.0
	Sender: bryancantrill@gmail.com
	Received: by 10.68.33.97 with HTTP; Sat, 21 Dec 2013 00:11:05 -0800 (PST)
	In-Reply-To: <CA+0vZFK2i8aq5DUQpt7wdk=QgodKOOEFgU7vz6n1ti0di-r4hw@mail.gmail.com>
	References: <CANp9fE8t68+5_oVc=k=7oQPrSCNH5pnLZ2oP9x3YG8g4a9063A@mail.gmail.com>
	<CA+0vZFLRGYwysd8kO+KnazBG=vupfORTD+h1yj_dWur2f7LpPQ@mail.gmail.com>
	<CA+0vZFLpu5HFhJZLmpm01WOyzHjnaeHrKcuU7YEpMr6hEvyiRA@mail.gmail.com>
	<CAAm8y+jXcQ1ix5neUusW6NHXg+mb0V_ZNxrudjTaETdxhUQzKQ@mail.gmail.com>
	<F811D3C5-531A-492A-B390-5086F206DD6C@cheney.net>
	<CA+0vZFK2i8aq5DUQpt7wdk=QgodKOOEFgU7vz6n1ti0di-r4hw@mail.gmail.com>
	Date: Sat, 21 Dec 2013 00:11:05 -0800
	Delivered-To: bryancantrill@gmail.com
	X-Google-Sender-Auth: 1-P9HY9dQxa-Mr_9AY_InMAQXGE
	Message-ID: <CAAm8y+hbBWjw10E6CVjt+ROL-sOGeS1pnahppRVUoGfQHX2snA@mail.gmail.com>
	Subject: Re: select and port problems on solaris port
	From: Bryan Cantrill <bryan@joyent.com>
	To: =?UTF-8?B?QXJhbSBIxIN2xINybmVhbnU=?= <aram@mgk.ro>
	Cc: Dave Cheney <dave@cheney.net>
	Content-Type: text/plain; charset=UTF-8
	Content-Transfer-Encoding: quoted-printable

	Hey Aram,

	Thanks (as always) for your thoughts and guidance. I thought about
	this this week, and it seems that one major advantage we have is that
	we are actually interposing on all system calls -- we know when the
	runtime gets EAGAIN/EWOULDBLOCK/EINPROGRESS from a system call and can
	use that to then (and only then) associate the event with the port.
	This allows us to effectively replicate edge-triggered events on top
	of level-triggered ones. While it still needs some polish (e.g., to
	dynamically size the PortPollDesc array), here's a patch that both (1)
	passes all tests and (2) doesn't spin on Dave's test:

	http://us-east.manta.joyent.com/bcantrill/public/go/edge-triggered-port.diff.txt

	Still needs some polish (e.g., I need to update the Perl script that
	generates zsyscall_solaris_amd64.go), but I think that this route
	seems promising. What do you think?

	- Bryan


	On Mon, Dec 16, 2013 at 7:32 AM, Aram H=C4=83v=C4=83rneanu <aram@mgk.ro> wr=
	ote:
	> Oh, how I hate all this stuff in the name of performance.
	>
	> Edge-triggering is a hack done by people obsessed with reducing
	> syscall count. Level triggering is so simple to understand and use.
	> You block for something, and once you get it you can do whatever you
	> want with it. You can process it synchronously, and then you don't
	> have to do anything special, or you can process it asynchronously and
	> then the only thing you have to do, the only thing, is to remember to
	> add the fd back to the list of interesting fd's once you are done.
	> This requirement is trivial, and in fact the explicitness of it is
	> quite useful in understanding the code. You can read one byte, a
	> hundred. Doesn't matter; you can do whatever you want.
	>
	> With edge-triggered events it's not like this. At first they seem
	> useful because you don't have to remember to add fd's back, but that'a
	> a trivial requirement. The downsides are that it's spectacularly easy
	> to deadlock, you have to process it asynchronously, and you have to
	> process every single byte out of it even if the semantics of the
	> application doesn't require it. The kernel does the bookkeeping for
	> us, but the kernel doesn't know what we want from it, so the
	> programmer has to adapt to a stricter and harder to use programming
	> model.
	>
	> But we're optimised, so making programming harder is okay.
	>
	> I guess what we have to do is find a way to tell the poller every time
	> when we are actually interested in a fd. Now we tell it once we create
	> a socket, but we need to tell it after we process any kind of I/O.
	> Then we can postpone shoehorning back fd's after we get events until
	> we actually care about more events. Unfortunately, this seems very
	> hard.
	>
	> A hack would be to somehow consume the connection-ready event in a
	> safe manner. I think if we solve this, the problem goes away (maybe we
	> need to do something about errors too). This would be very easy if the
	> poller would simple offer hints to the scheduler. Of course, in the
	> name of performance, the poller doesn't simply offer hints, it's vital
	> to the scheduler function.
	>
	> I think the easiest way to implement that hack is to special-case
	> WaitWrite for Connect (or rather, not use WaitWrite, but WaitConnect
	> in Connect). On non-Solaris systems WaitConnect can simply be
	> WaitWrite. On Solaris won't (eventually) call runtime=C2=B7netpoll, but i=
	t
	> will block in port_get.
	>
	> --
	> Aram H=C4=83v=C4=83rneanu