This is a short debugging story that led to a few learnings. Our team is working on making TruffleRuby support one of our larger internal Ruby application. Running tests in CI we've noticed a flaky test failure reporting:
undefined method 'write_without_cli_ui' for #<IO:<STDOUT>> (NoMethodError)
While this looked an obvious mistake, our initial findings were confusing. Our application in test was using the Cli-UI gem, where one adjustment the gem is making is aliasing the STDOUT.write
method, so it can do writes with extra features.
To exclude basic mistakes we've confirmed:
- the aliased method
write_without_cli_ui
is never called outside of the gem (or this context), but purely from the re-definedSTDOUT.write
method - this gem code is the only place where this specific metaprogramming has been applied
- this code (method aliasing and redefinition) is applied once
- there is always only one
STDOUT
object used (same as$stdout
)
We've also confirmed that when the error happens, the missing method had already been applied. We looked into a few theories how a method can be undefined (other metaprogramming, serialization/deserialization), but all failed.
At this point we've exhausted all trivial culprits, and started to look into the deeper layers, by examining:
- running process / thread changes
- object / class / singleton class
Here we've found our first surprising revelation: STDOUT
's singleton class is different at the point of error, while everything else remained the same (same object in memory).
We've looked into the CRuby implemantation of IO, and stumbled upon IO#reopen:
RBASIC_SET_CLASS(io, rb_obj_class(nfile));
(or the equivalent code in TruffleRuby):
Primitive.vm_set_class self, Primitive.class(io)
This is a method that replaces the unrelying object's class without providing a new object. The documentation also has a slight hint to it:
Reassociates the stream with another stream, which may be of a different class. This method may be used to redirect an existing stream to a new destination.
It didn't take long to find, where this was called, through ActiveSupport::Testing::Stream#capture.
We still had a trouble explaining the error. If a singleton class has been replaced, and the specific write_without_cli_ui
method is only called via the redefined #write
method, which also only lived on the sigleton class, how to we still call these methods after #reopen
removed the singleton class from STDOUT
?
That is a little bit more tricky and can be explained with how TruffleRuby is managing singleton class lookups. It had a basic expectation that an object's singleton class cannot be replaced. As such, as a performance measurement TruffleRuby cached the singleton class AST node on the first time of opening it (lazily), and kept using it on later class lookups.
Reproducing and writing a regression test case for it was also non obvious. TruffleRuby can optimize some forms of method calls to a simple dispatch node, which did not need a class/method lookup. With that in mind we can drive the #write
call to an incorrectly cached singleton class instance, and the write_without_cli_ui
call to the fresh one:
# In TruffleRuby:
def indirection(io)
io.foo
end
$stdout.define_singleton_method(:foo) {}
$stdout.reopen(File.open(__FILE__))
indirection($stdout) # => undefined method 'foo' for #<IO:<STDOUT>> (NoMethodError)
The fix at this point was rather trivial: oracle/truffleruby#3088
We've opened a discussion on the Ruby project to see how can we reduce the surprise effect with this behaviour. While the fix for TruffleRuby removes the error, it does not prevent IO#reopen
wiping the object's class/singleton class, which could disable added functionality to IO objects.