Skip to content

Instantly share code, notes, and snippets.

@tmm1
Created February 11, 2009 01:55
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 3 You must be signed in to fork a gist
  • Save tmm1/61762 to your computer and use it in GitHub Desktop.
Save tmm1/61762 to your computer and use it in GitHub Desktop.
FAQ about MRI internals
> - In ruby 1.8.x, what is the functional difference between rb_thread_schedule and rb_thread_select?
rb_thread_schedule() is the guts of the thread scheduler, it traverses
over the linked list of threads (several times) to find the next one
to switch into. The function is long (250 lines) and messy, and covers
all the combinations of thread status (RUNNABLE, TO_KILL, STOPPED,
KILLED) and wait state (FD, SELECT, TIME, JOIN, PID).
If there are no threads doing i/o or waiting on a timeout,
rb_thread_schedule() picks another thread from the list (considering
thread priorities and states) and switches into it. Where there is
i/o, it collects all the file descriptors associated with STOPPED
threads that are WAIT_FD or WAIT_SELECT and runs select() on them. It
also uses select() as a way to sleep, passing in the smallest timeout
associated with any WAIT_TIME threads.
In 1.8, rb_thread_schedule is called every 10 milliseconds. When you
compile with --disable-pthread, ruby calls setitimer() as soon as an
additional thread is created and kernel sends the process a SIGVTALRM
every 10 milliseconds; ruby uses the signal handler to set
rb_thread_pending = 1, which lets it know it needs to call
rb_thread_schedule(). In the case where --enable-pthreads, a kernel
thread is spawned which sits in a loop, nanosleeping for 10
milliseconds and firing SIGVTALRM on the main thread.
rb_thread_select() on the other hand, is simply the ruby version of
select(). When you call select() from ruby, it invokes
rb_thread_select(), which adds the file descriptors you passed in to
the current running thread, and puts the thread in a WAIT_SELECT. Then
it simply goes on to invoke rb_thread_schedule(), which will take
those fds along with any other fds other threads care about and call
select() on them all.
Calling rb_thread_select() with no fds (like EM used to do in the
epoll/kqueue case), is simply a roundabout way of calling
rb_thread_schedule(). The function is more useful when you actually
have a thread waiting on i/o, as is the case with mysqplus which calls
rb_thread_select() on the mysql connection's file descriptor,
effectively putting that thread in a WAIT_SELECT and letting other
threads run until the query's results are available.
> - There is plenty of lore about rb_thread_select being very slow, any particular reason?
The problem is that rb_thread_select() uses rb_thread_schedule(),
which in turn uses select(). rb_thread_schedule() doesn't scale well
when you have a lot of threads or a lot of file descriptors, since the
function is invoked so often, has to traverse the list of threads
constantly and repeatedly builds up big lists of file descriptors to
pass into the kernel. Many of these problems are inherent to
select().. there's a max of (usually) 1024 fds it can handle, and the
performance gets worse as your increase the number of fds, or if you
have a sparsely filled FDSET.
> - In ruby 1.9.x, is rb_thread_blocking_region is basically equivalent to rb_thread_select?
Not really.. rb_thread_blocking_region is a way to run code outside
the 1.9 GIL. The use case here is primarily for IO and external
processes (popen). In EM for example, this is really useful because we
can run the epoll/kqueue blocking system calls, but still allow other
ruby threads to run at the same time.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment