Skip to content

Instantly share code, notes, and snippets.

@burke
Created March 26, 2020 18:57
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save burke/ac6b653d6abe62b1d2be34c557d41b6e to your computer and use it in GitHub Desktop.
Save burke/ac6b653d6abe62b1d2be34c557d41b6e to your computer and use it in GitHub Desktop.
# typed: false
# frozen_string_literal: true
# Note: This is loaded *ludicrously* early in the boot process: please don't
# introduce other dependencies here.
# ===
# Okay so here's the deal. When we compile a bundle of ruby code via
# `bundlerEnv`, we install all the gems individually into a bunch of separate
# paths in the Nix Store. That means paths exist like:
#
# * /nix/store/hpp629q8gc0015ymk10fh74hsl71c048-ruby2.6.5-rmagick-4.0.0
# * /nix/store/n4ndzcaw8mgmaiyxy2wk9q91n02lqpm8-ruby2.6.5-nokogiri-1.10.7
#
# However, there's a function that compiles all these into a single $LOAD_PATH
# entry, which Bundler understands. That bundle is a big bag of symlinks that
# actually point to those individually-installed gems above For example:
#
# /nix/store/z5962m0a0aw6989mab0sg3bx8611c3a6-bundler-dev-env-dev
# └── lib/ruby/gems/2.6.0/gems
# └── ansi-1.5.0 -> /nix/store/8fr4...-ruby2.6.5-ansi-1.5.0/lib/ruby/gems/2.6.0/gems/ansi-1.5.0/
#
# So this all works great, except that this means there are two possible paths
# by which any given source file can be required: the Bundle path, and the
# Resolved path. The problem is that ruby tracks which files are already loaded
# by adding the full expanded path to an array called `$LOADED_FEATURES`, and
# automatically returning from `require` if the target file is already present
# in that array. When we require a file twice -- first by one path, then by the
# other -- we can double-load the file, which for some source files will cause
# warnings or even breakage.
#
# So, we need to find a way to prevent loading the same source from both
# locations. It might be useful to think about this in terms of Category
# Theory:
#
# Let's think of that Bundle path -- the one we want -- as category B, and the
# Resolved path as category R. There are morphisms from B->R, which is to say
# methods for resolving the symlinks from a bundle path into a resolved path:
#
# * __dir__ takes you from B→R, by following the symlinks before returning a
# path.
# * require_relative takes you from B→R: calling it from a file loaded in
# category B loads the requested file in category R.
# * explicitly calling File.realpath takes a path from B to R.
#
# We can also implement an R→B morphism -- turn a resolved path back into a
# bundled path -- by manually rewriting the path.
#
# d, r, p d : __dir__
# ┏━━━━━━━━━━━━━┓ ━━━━━━━⯈ ┏━━━━━━━━━━━━━━┓ r : require_relative
# ┃ B (bundled) ┃ ┃ R (resolved) ┃ p : File.realpath
# ┗━━━━━━━━━━━━━┛ ⯇━━━━━━━ ┗━━━━━━━━━━━━━━┛ m : manual rewrite
# m
#
# What we want, in order to keep files getting loaded only once, is to
# intercept file load calls and force them to always be passed to ruby's
# built-in file loading mechanism in one category or the other, but not a mix
# of the two: $LOADED_FEATURES should be a collection of objects in just one of
# B or R.
#
# It feels conceptually appealing for $LOADED_FEATURES to contain objects in
# category B, by applying that `m` morphism to an object in category R if
# necessary when it's incoming to `require`. However, in practice there are
# just too many different code paths that resolve paths from B→R to be
# confident we've caught everything: ruby REALLY likes to resolve paths, so the
# path of least resistance is to collect objects in category R.
#
# The easiest way to collect only paths in category R is to just rewrite paths
# -- resolve their symlinks -- as they're added to $LOAD_PATH, which is what we
# do here.
#
# Another potentially-viable option, or maybe a supplementary strategy if we
# turn out to need it, would be to intercept calls to require,
# require_relative, load, and autoload, like we do in bootsnap.
module NixStoreSafeguard
def self.realpath(path)
return path unless path.to_s.start_with?('/nix/store/')
real = File.realpath(path)
# If path == real, the directory wasn't a symlink. It probably contains
# symlinks, and needs further resolution...
path == real ? (resolve(path) || path) : real
# There are some non-existent paths that we try to add to the $LOAD_PATH
# for some reason. This is weird behaviour but not really invalid, so we'll
# tolerate it.
rescue Errno::ENOENT
path
end
# When a git clone is shared by multiple gems (i.e. rails), the directory
# itself won't be a symlink, but each file inside will symlink to the same
# eventual target. This is a stupid hack to use the ultimate target
# directory rather than the bag o' symlinks. In practice this basically only
# gets run once for each rails/rails gem we add to the LOAD_PATH, and I
# highly doubt the recursive case is ever hit.
def self.resolve(dir, *sub)
currdir = File.join(dir, *sub)
(Dir.entries(currdir) - %w(. ..)).each do |entry|
path = File.join(currdir, entry)
stat = File.lstat(path)
if stat.symlink?
real = File.realpath(path)
(sub.size + 1).times { real = File.dirname(real) }
return(real)
elsif stat.file?
return(dir)
elsif stat.directory?
if (res = resolve(dir, *sub, entry))
return res
end
end
end
nil
end
def self.realpaths(paths)
paths.map { |p| realpath(p) }
end
module ArrayMixin
def <<(entry)
super(NixStoreSafeguard.realpath(entry))
end
def push(*entries)
super(*NixStoreSafeguard.realpaths(entries))
end
def unshift(*entries)
super(*NixStoreSafeguard.realpaths(entries))
end
def concat(entries)
super(NixStoreSafeguard.realpaths(entries))
end
def insert(index, *entries)
super(index, *NixStoreSafeguard.realpaths(entries))
end
def []=(*)
raise(NotImplementedError)
end
def collect!(*)
super { |entry| NixStoreSafeguard.realpath(block.call(entry)) }
end
def map!(&block)
super { |entry| NixStoreSafeguard.realpath(block.call(entry)) }
end
def replace(entries)
super(NixStoreSafeguard.realpaths(entries))
end
def fill(*)
raise(NotImplementedError)
end
end
end
$LOAD_PATH.singleton_class.prepend(NixStoreSafeguard::ArrayMixin)
@lilyball
Copy link

lilyball commented Aug 4, 2020

I was wondering why I never saw this issue with bundlerApp and I think I figured it out. The binary exposed by bundlerApp does a require 'bundler/setup' internally (well, require 'bundler'; Bundler.setup() but it's the same thing) before loading any gems, and the bundler setup step configures $LOAD_PATHS with the realpaths of all gems. I can still reproduce the issue using the wrappedRuby of the app though.

I'm thinking the simplest fix here might simply be to update wrappedRuby to prepend -rbundler/setup to RUBYOPT. It already effectively does this using irbrc for the nested env attribute (for use with nix-shell), but not for ruby and not for either with the wrappedRuby attribute.

@erikarvstedt
Copy link

@lilyball, this is a severe and hard to debug issue that affects all users of bundlerEnv. Can I help you fixing this?

For example, the vagrant derivation, which wraps its binary by setting GEM_PATH to a bundler env, was affected by this bug. A brave user, who seems to have spent a long time debugging, added this workaround, which amounts to a manual version of copyGemFiles = true.
Here's another related issue.

I think the hardlink approach would be the best solution, as it prevents misusing bundler envs at the lowest level. But I don't feel competent to judge as I'm lacking deep familiarity with nixpkgs' Ruby architecture.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment