tmm1/gist:61762

## gistfile1.txt
>  - In ruby 1.8.x, what is the functional difference between rb_thread_schedule and rb_thread_select?

rb_thread_schedule() is the guts of the thread scheduler, it traverses
over the linked list of threads (several times) to find the next one
to switch into. The function is long (250 lines) and messy, and covers
all the combinations of thread status (RUNNABLE, TO_KILL, STOPPED,
KILLED) and wait state (FD, SELECT, TIME, JOIN, PID).

If there are no threads doing i/o or waiting on a timeout,
rb_thread_schedule() picks another thread from the list (considering
thread priorities and states) and switches into it. Where there is
i/o, it collects all the file descriptors associated with STOPPED
threads that are WAIT_FD or WAIT_SELECT and runs select() on them. It
also uses select() as a way to sleep, passing in the smallest timeout
associated with any WAIT_TIME threads.

In 1.8, rb_thread_schedule is called every 10 milliseconds. When you
compile with --disable-pthread, ruby calls setitimer() as soon as an
additional thread is created and kernel sends the process a SIGVTALRM
every 10 milliseconds; ruby uses the signal handler to set
rb_thread_pending = 1, which lets it know it needs to call
rb_thread_schedule(). In the case where --enable-pthreads, a kernel
thread is spawned which sits in a loop, nanosleeping for 10
milliseconds and firing SIGVTALRM on the main thread.

rb_thread_select() on the other hand, is simply the ruby version of
select(). When you call select() from ruby, it invokes
rb_thread_select(), which adds the file descriptors you passed in to
the current running thread, and puts the thread in a WAIT_SELECT. Then
it simply goes on to invoke rb_thread_schedule(), which will take
those fds along with any other fds other threads care about and call
select() on them all.

Calling rb_thread_select() with no fds (like EM used to do in the
epoll/kqueue case), is simply a roundabout way of calling
rb_thread_schedule(). The function is more useful when you actually
have a thread waiting on i/o, as is the case with mysqplus which calls
rb_thread_select() on the mysql connection's file descriptor,
effectively putting that thread in a WAIT_SELECT and letting other
threads run until the query's results are available.

>  - There is plenty of lore about rb_thread_select being very slow, any particular reason?

The problem is that rb_thread_select() uses rb_thread_schedule(),
which in turn uses select(). rb_thread_schedule() doesn't scale well
when you have a lot of threads or a lot of file descriptors, since the
function is invoked so often, has to traverse the list of threads
constantly and repeatedly builds up big lists of file descriptors to
pass into the kernel. Many of these problems are inherent to
select().. there's a max of (usually) 1024 fds it can handle, and the
performance gets worse as your increase the number of fds, or if you
have a sparsely filled FDSET.

>  - In ruby 1.9.x, is rb_thread_blocking_region is basically equivalent to rb_thread_select?

Not really.. rb_thread_blocking_region is a way to run code outside
the 1.9 GIL. The use case here is primarily for IO and external
processes (popen). In EM for example, this is really useful because we
can run the epoll/kqueue blocking system calls, but still allow other
ruby threads to run at the same time.
	> - In ruby 1.8.x, what is the functional difference between rb_thread_schedule and rb_thread_select?

	rb_thread_schedule() is the guts of the thread scheduler, it traverses
	over the linked list of threads (several times) to find the next one
	to switch into. The function is long (250 lines) and messy, and covers
	all the combinations of thread status (RUNNABLE, TO_KILL, STOPPED,
	KILLED) and wait state (FD, SELECT, TIME, JOIN, PID).

	If there are no threads doing i/o or waiting on a timeout,
	rb_thread_schedule() picks another thread from the list (considering
	thread priorities and states) and switches into it. Where there is
	i/o, it collects all the file descriptors associated with STOPPED
	threads that are WAIT_FD or WAIT_SELECT and runs select() on them. It
	also uses select() as a way to sleep, passing in the smallest timeout
	associated with any WAIT_TIME threads.

	In 1.8, rb_thread_schedule is called every 10 milliseconds. When you
	compile with --disable-pthread, ruby calls setitimer() as soon as an
	additional thread is created and kernel sends the process a SIGVTALRM
	every 10 milliseconds; ruby uses the signal handler to set
	rb_thread_pending = 1, which lets it know it needs to call
	rb_thread_schedule(). In the case where --enable-pthreads, a kernel
	thread is spawned which sits in a loop, nanosleeping for 10
	milliseconds and firing SIGVTALRM on the main thread.

	rb_thread_select() on the other hand, is simply the ruby version of
	select(). When you call select() from ruby, it invokes
	rb_thread_select(), which adds the file descriptors you passed in to
	the current running thread, and puts the thread in a WAIT_SELECT. Then
	it simply goes on to invoke rb_thread_schedule(), which will take
	those fds along with any other fds other threads care about and call
	select() on them all.

	Calling rb_thread_select() with no fds (like EM used to do in the
	epoll/kqueue case), is simply a roundabout way of calling
	rb_thread_schedule(). The function is more useful when you actually
	have a thread waiting on i/o, as is the case with mysqplus which calls
	rb_thread_select() on the mysql connection's file descriptor,
	effectively putting that thread in a WAIT_SELECT and letting other
	threads run until the query's results are available.

	> - There is plenty of lore about rb_thread_select being very slow, any particular reason?

	The problem is that rb_thread_select() uses rb_thread_schedule(),
	which in turn uses select(). rb_thread_schedule() doesn't scale well
	when you have a lot of threads or a lot of file descriptors, since the
	function is invoked so often, has to traverse the list of threads
	constantly and repeatedly builds up big lists of file descriptors to
	pass into the kernel. Many of these problems are inherent to
	select().. there's a max of (usually) 1024 fds it can handle, and the
	performance gets worse as your increase the number of fds, or if you
	have a sparsely filled FDSET.

	> - In ruby 1.9.x, is rb_thread_blocking_region is basically equivalent to rb_thread_select?

	Not really.. rb_thread_blocking_region is a way to run code outside
	the 1.9 GIL. The use case here is primarily for IO and external
	processes (popen). In EM for example, this is really useful because we
	can run the epoll/kqueue blocking system calls, but still allow other
	ruby threads to run at the same time.