itarato/io_reopen.md

## io_reopen.md

      
    Raw
  

              io_reopen.md
            
          
    IO#reopen and its surprising side effect

This is a short debugging story that led to a few learnings. Our team is working on making TruffleRuby support one of our larger internal Ruby application. Running tests in CI we've noticed a flaky test failure reporting:
undefined method 'write_without_cli_ui' for #<IO:<STDOUT>> (NoMethodError)
Debugging

While this looked an obvious mistake, our initial findings were confusing. Our application in test was using the Cli-UI gem, where one adjustment the gem is making is aliasing the STDOUT.write method, so it can do writes with extra features.
To exclude basic mistakes we've confirmed:

the aliased method write_without_cli_ui is never called outside of the gem (or this context), but purely from the re-defined STDOUT.write method
this gem code is the only place where this specific metaprogramming has been applied
this code (method aliasing and redefinition) is applied once
there is always only one STDOUT object used (same as $stdout)

We've also confirmed that when the error happens, the missing method had already been applied. We looked into a few theories how a method can be undefined (other metaprogramming, serialization/deserialization), but all failed.
At this point we've exhausted all trivial culprits, and started to look into the deeper layers, by examining:

running process / thread changes
object / class / singleton class

Here we've found our first surprising revelation: STDOUT's singleton class is different at the point of error, while everything else remained the same (same object in memory).
We've looked into the CRuby implemantation of IO, and stumbled upon IO#reopen:
RBASIC_SET_CLASS(io, rb_obj_class(nfile));
(or the equivalent code in TruffleRuby):
Primitive.vm_set_class self, Primitive.class(io)
This is a method that replaces the unrelying object's class without providing a new object. The documentation also has a slight hint to it:
Reassociates the stream with another stream, which may be of a different class. This method may be used to redirect an existing stream to a new destination.

It didn't take long to find, where this was called, through ActiveSupport::Testing::Stream#capture.
We still had a trouble explaining the error. If a singleton class has been replaced, and the specific write_without_cli_ui method is only called via the redefined #write method, which also only lived on the sigleton class, how to we still call these methods after #reopen removed the singleton class from STDOUT?
That is a little bit more tricky and can be explained with how TruffleRuby is managing singleton class lookups. It had a basic expectation that an object's singleton class cannot be replaced. As such, as a performance measurement TruffleRuby cached the singleton class AST node on the first time of opening it (lazily), and kept using it on later class lookups.
The fix

Reproducing and writing a regression test case for it was also non obvious. TruffleRuby can optimize some forms of method calls to a simple dispatch node, which did not need a class/method lookup. With that in mind we can drive the #write call to an incorrectly cached singleton class instance, and the write_without_cli_ui call to the fresh one:
# In TruffleRuby:

def indirection(io)
	io.foo
end

$stdout.define_singleton_method(:foo) {}
$stdout.reopen(File.open(__FILE__))

indirection($stdout) # => undefined method 'foo' for #<IO:<STDOUT>> (NoMethodError)
The fix at this point was rather trivial: oracle/truffleruby#3088
We've opened a discussion on the Ruby project to see how can we reduce the surprise effect with this behaviour. While the fix for TruffleRuby removes the error, it does not prevent IO#reopen wiping the object's class/singleton class, which could disable added functionality to IO objects.