Skip to content

Instantly share code, notes, and snippets.

@burke
Created March 26, 2020 18:57
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save burke/ac6b653d6abe62b1d2be34c557d41b6e to your computer and use it in GitHub Desktop.
Save burke/ac6b653d6abe62b1d2be34c557d41b6e to your computer and use it in GitHub Desktop.
# typed: false
# frozen_string_literal: true
# Note: This is loaded *ludicrously* early in the boot process: please don't
# introduce other dependencies here.
# ===
# Okay so here's the deal. When we compile a bundle of ruby code via
# `bundlerEnv`, we install all the gems individually into a bunch of separate
# paths in the Nix Store. That means paths exist like:
#
# * /nix/store/hpp629q8gc0015ymk10fh74hsl71c048-ruby2.6.5-rmagick-4.0.0
# * /nix/store/n4ndzcaw8mgmaiyxy2wk9q91n02lqpm8-ruby2.6.5-nokogiri-1.10.7
#
# However, there's a function that compiles all these into a single $LOAD_PATH
# entry, which Bundler understands. That bundle is a big bag of symlinks that
# actually point to those individually-installed gems above For example:
#
# /nix/store/z5962m0a0aw6989mab0sg3bx8611c3a6-bundler-dev-env-dev
# └── lib/ruby/gems/2.6.0/gems
# └── ansi-1.5.0 -> /nix/store/8fr4...-ruby2.6.5-ansi-1.5.0/lib/ruby/gems/2.6.0/gems/ansi-1.5.0/
#
# So this all works great, except that this means there are two possible paths
# by which any given source file can be required: the Bundle path, and the
# Resolved path. The problem is that ruby tracks which files are already loaded
# by adding the full expanded path to an array called `$LOADED_FEATURES`, and
# automatically returning from `require` if the target file is already present
# in that array. When we require a file twice -- first by one path, then by the
# other -- we can double-load the file, which for some source files will cause
# warnings or even breakage.
#
# So, we need to find a way to prevent loading the same source from both
# locations. It might be useful to think about this in terms of Category
# Theory:
#
# Let's think of that Bundle path -- the one we want -- as category B, and the
# Resolved path as category R. There are morphisms from B->R, which is to say
# methods for resolving the symlinks from a bundle path into a resolved path:
#
# * __dir__ takes you from B→R, by following the symlinks before returning a
# path.
# * require_relative takes you from B→R: calling it from a file loaded in
# category B loads the requested file in category R.
# * explicitly calling File.realpath takes a path from B to R.
#
# We can also implement an R→B morphism -- turn a resolved path back into a
# bundled path -- by manually rewriting the path.
#
# d, r, p d : __dir__
# ┏━━━━━━━━━━━━━┓ ━━━━━━━⯈ ┏━━━━━━━━━━━━━━┓ r : require_relative
# ┃ B (bundled) ┃ ┃ R (resolved) ┃ p : File.realpath
# ┗━━━━━━━━━━━━━┛ ⯇━━━━━━━ ┗━━━━━━━━━━━━━━┛ m : manual rewrite
# m
#
# What we want, in order to keep files getting loaded only once, is to
# intercept file load calls and force them to always be passed to ruby's
# built-in file loading mechanism in one category or the other, but not a mix
# of the two: $LOADED_FEATURES should be a collection of objects in just one of
# B or R.
#
# It feels conceptually appealing for $LOADED_FEATURES to contain objects in
# category B, by applying that `m` morphism to an object in category R if
# necessary when it's incoming to `require`. However, in practice there are
# just too many different code paths that resolve paths from B→R to be
# confident we've caught everything: ruby REALLY likes to resolve paths, so the
# path of least resistance is to collect objects in category R.
#
# The easiest way to collect only paths in category R is to just rewrite paths
# -- resolve their symlinks -- as they're added to $LOAD_PATH, which is what we
# do here.
#
# Another potentially-viable option, or maybe a supplementary strategy if we
# turn out to need it, would be to intercept calls to require,
# require_relative, load, and autoload, like we do in bootsnap.
module NixStoreSafeguard
def self.realpath(path)
return path unless path.to_s.start_with?('/nix/store/')
real = File.realpath(path)
# If path == real, the directory wasn't a symlink. It probably contains
# symlinks, and needs further resolution...
path == real ? (resolve(path) || path) : real
# There are some non-existent paths that we try to add to the $LOAD_PATH
# for some reason. This is weird behaviour but not really invalid, so we'll
# tolerate it.
rescue Errno::ENOENT
path
end
# When a git clone is shared by multiple gems (i.e. rails), the directory
# itself won't be a symlink, but each file inside will symlink to the same
# eventual target. This is a stupid hack to use the ultimate target
# directory rather than the bag o' symlinks. In practice this basically only
# gets run once for each rails/rails gem we add to the LOAD_PATH, and I
# highly doubt the recursive case is ever hit.
def self.resolve(dir, *sub)
currdir = File.join(dir, *sub)
(Dir.entries(currdir) - %w(. ..)).each do |entry|
path = File.join(currdir, entry)
stat = File.lstat(path)
if stat.symlink?
real = File.realpath(path)
(sub.size + 1).times { real = File.dirname(real) }
return(real)
elsif stat.file?
return(dir)
elsif stat.directory?
if (res = resolve(dir, *sub, entry))
return res
end
end
end
nil
end
def self.realpaths(paths)
paths.map { |p| realpath(p) }
end
module ArrayMixin
def <<(entry)
super(NixStoreSafeguard.realpath(entry))
end
def push(*entries)
super(*NixStoreSafeguard.realpaths(entries))
end
def unshift(*entries)
super(*NixStoreSafeguard.realpaths(entries))
end
def concat(entries)
super(NixStoreSafeguard.realpaths(entries))
end
def insert(index, *entries)
super(index, *NixStoreSafeguard.realpaths(entries))
end
def []=(*)
raise(NotImplementedError)
end
def collect!(*)
super { |entry| NixStoreSafeguard.realpath(block.call(entry)) }
end
def map!(&block)
super { |entry| NixStoreSafeguard.realpath(block.call(entry)) }
end
def replace(entries)
super(NixStoreSafeguard.realpaths(entries))
end
def fill(*)
raise(NotImplementedError)
end
end
end
$LOAD_PATH.singleton_class.prepend(NixStoreSafeguard::ArrayMixin)
@shepting
Copy link

shepting commented Aug 3, 2020

This is great! I was running into this same issue as well and created https://github.com/shepting/ruby-nix-sample to try to debug (and at least show) those loading issues. @lilyintech discovered a workaround to add copyGemFiles = true; in the pkgs.bundlerEnv which means that the env doesn't use symlinks anymore and literally copies all of the files. So far it's working for us!

@lilyball
Copy link

lilyball commented Aug 3, 2020

This workaround certainly seems to work, but it's not the sort of thing that I'd be comfortable having bundlerEnv inject automatically. But the problem here is very real, and can be reproduced with a minimal

source "https://rubygems.org"
gem 'json'

(for some reason when a compiled extension uses rb_require("path") it can load from the wrong path despite $LOAD_PATH not ever having the bad directory added to it).

When I find the time I'm going to file a nixpkgs issue, I think bundlerEnv needs to just go ahead and always copy files. The only real downside I can think of is if I collect garbage then the independently-built gems will be collected, and will have to be redownloaded/rebuilt the next time the bundlerEnv is rebuilt. Or maybe bundlerEnv can explicitly hardlink the copied gems and somehow add a reference to the original to keep it around (heck, just write out a text file that embeds the path to each source gem). That way the individual gems won't get collected as long as any environment is using them. But this is something that can be debated in the issue that I will file.

@lilyball
Copy link

lilyball commented Aug 4, 2020

I was wondering why I never saw this issue with bundlerApp and I think I figured it out. The binary exposed by bundlerApp does a require 'bundler/setup' internally (well, require 'bundler'; Bundler.setup() but it's the same thing) before loading any gems, and the bundler setup step configures $LOAD_PATHS with the realpaths of all gems. I can still reproduce the issue using the wrappedRuby of the app though.

I'm thinking the simplest fix here might simply be to update wrappedRuby to prepend -rbundler/setup to RUBYOPT. It already effectively does this using irbrc for the nested env attribute (for use with nix-shell), but not for ruby and not for either with the wrappedRuby attribute.

@erikarvstedt
Copy link

@lilyball, this is a severe and hard to debug issue that affects all users of bundlerEnv. Can I help you fixing this?

For example, the vagrant derivation, which wraps its binary by setting GEM_PATH to a bundler env, was affected by this bug. A brave user, who seems to have spent a long time debugging, added this workaround, which amounts to a manual version of copyGemFiles = true.
Here's another related issue.

I think the hardlink approach would be the best solution, as it prevents misusing bundler envs at the lowest level. But I don't feel competent to judge as I'm lacking deep familiarity with nixpkgs' Ruby architecture.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment