Some gems require external dependencies, such as gems containing C extensions
that compile against external C libraries (ex: pg) or gems that wrap around
external command-line utilities (ex: graphviz). Presently there is no
mechanism to automatically install these external dependencies. Instead, it is
the user's responsibility to figure out the correct package name and install the
external dependency via their system's package manager (ex: apt or brew)
and attempt installing the gem again. This often results in users becoming
frustrated when a C extension does not successfully build and then having to
search google and StackOverflow for which package needs to be installed.
Very rarely do gem authors list external dependencies via the gemspec
requirements attribute, and if so, it usually lists the canonical project
name of the external dependency (ex: sqlite3) and not the package name
(ex: libsqlite3-dev).
To workaround the problem of external dependencies some popular gems (ex:
nokogiri) have begun vendoring their external dependencies directly into the
gem and building both the C library and any C extensions during installation.
There are two downsides to this approach:
- Increased compilation time.
- Security Advisories: each time a new security advisory is published for the vendored library, the gem maintainer has to update the vendored copy of the C library, publish an updated version of the gem, and then publish their own security advisory warning users that the previous versions of their gem contains a vulnerable copy of the vendored library.
It should be possible to list the external dependencies required by a gem and have rubygems automatically install them from the system's package manager during the installation process.
This external_dependencies metadata could be embedded in the gemspec's
metadata attribute. In order to support naming differences between different
package managers, the package names for the external dependencies would need to
be listed for each popular package manager.
gemspec.metadata['external_dependencies'] = {
'apt' => %w[libsqlite3-dev],
'dnf' => %w[sqlite-devel],
'brew' => %w[sqlite],
# ...
}If only one external dependency needs to be specified, then the values of the
external_dependencies Hash could also be single package name Strings which
would later be automatically coerced into an Array:
gemspec.metadata['external_dependencies'] = {
'apt' => 'libsqlite3-dev',
'dnf' => 'sqlite-devel',
'brew' => 'sqlite',
# ...
}However, this might be a bit confusing?
If all of the package names are the same for each package manager,
then external_dependencies could be specified as an Array of Strings:
gemspec.metadata['external_dependencies'] = %w[nmap]Multiple package managers may be installed on the same system. In order to determine the primary system package manager, the system's package manager can be selected based on the OS/distro/flavor.
macOS is a unique edge-case since it does not have a default package manager. So there should be a prioritized list of macOS package managers to check for in order of popularity:
brewports
If gems can specify packages names that will be installed via an
apt-get install or brew install command, special concern should be made to
prevent arbitrary command injection. system() with multiple arguments
(ex: system('apt-get','install',...) or Shellwords.shellescape must be
used to prevent command injection.
In order to prevent option injection via the gemspec's package names, an
argument of '--' or '-' can be specified before the package names to
prematurely terminate option parsing and prevent the package names, so that
they are not accidentally parsed as options.
Some users may wish to customize which package manager is used on their system, if they have multiple package managers installed alongside each other. It may be necessary to add a configuration option or environment variable to control which package manager is used by default. Although, this feature request seems to be very rare.
If we allow annotating the external dependencies and automatically
installing them along with the gem, the gemspec's requirements attribute
might no longer be necessary and could be deprecated in the future?
Thanks for drafting this! I'm not sure where you'd like comments, so I'll put some thoughts here. I'd be happy to write these up somewhere more permanent if you like.
Naming! Of course. And I apologize.
"External dependency" is what's used in this proposal for the same concept I call "system dependency" in https://github.com/flavorjones/ruby-c-extensions-explained and what
native-package-installercalls a "native package". I don't feel strongly about naming, but "external dependency" isn't familiar to me and feels imprecise (aren't all dependencies external?). I'd like to encourage the use of a term that's already in use.But I don't feel too strongly and will use "external dependency" in these comments.
Gem::Specification#metadataAll the proposals above use
Hashdata stored inGem::Specification#metadata, but unfortunately metadata values must beStrings, which makes this attribute a bit challenging to use.Perhaps this proposal could specify that
#metadatashould start allowingHashvalues, or introduce a new attribute to hold external dependency information. But either way means challenges around backwards-compatibility with older versions of RubyGems.Another option could be to borrow the approach from JRuby's
jar-dependenciesproject (which defined Maven dependencies in the gemspec) which appends toGem::Specification#requirementsarray, which is anArray[String]. (I personally like#requirementsa bit better because this seems to be exactly the intention is behind the attribute!)Detecting when the dependency is already installed
I think we need to provide a way to detect if the package has already been installed that handles both package-manager-managed packages as well as manually-compiled-and-installed libraries. I'd like to propose that we adopt the
pkg-configname for the package (which perhaps naively presumes one exists).The
pkg-confignames are consistent across platforms, this is whatpkg-configis intended to do, and it seems like most distros come withpkg-configinstalled (we may want to verify this, but it's true for the main linux distros, for homebrew, and for msys2/mingw (I think)). Users can then setPKG_CONFIG_PATHto find manually-installed libraries in custom directories.So a configuration might look something like (and I'm using
#requirementsand a JSON string for reasons explained above):and at
gem installtime, RubyGems would do the following "presence check":pkg-configto determine if a package namedlibxml-2.0is present on the systemHandling multiple dependencies
If you agree with the previous section, then when there are multiple dependencies we will need multiple dependency definitions. This, I think, makes
#requirementsa better choice than#metadata, so that we can do something like:Detecting which elements of
#requirementsare external dependencies will probably require a "syntax" of sorts, like used byjar-dependencies, but maybe that's something simple like the characters"pkg "at the start.Opt-in
I think the installation of external dependencies should require the user to opt-in to the behavior. Perhaps the detection can always happen, but a sane failure message can be displayed telling the user how to opt-in (like a
--install-external-dependenciesflag ongem installorbundle install).Multiple packages might meet the requirements
There are going to be situations where multiple packages might meet the gem's requirements, for example
mysqlite2which builds with eitherlibmysqlclientorlibmariadb. How can we handle this? Maybe something like:which would use pkg-config to check for the presence of either
mariadbormysqlclientbefore proceeding to the installation phase.Another example of this is the
sqlite3gem which can be satisfied by eithersqliteorsqlcipher.Toolchain
I think we should also give some thought to allowing a gem with an extension to define the toolchain necessary to build it, too. (Note that we have an entire section in the Nokogiri manual dedicated to this topic.)
For C extensions, maybe it's something like:
and the "cc" toolchain might reasonably be implemented under the hood as a set of external dependencies like this:
[ { "apt": "gcc", ... }, { "apt": "make", ... }, { "apt": "patch", ... } ]This suggestion feels pretty squishy, but I recently had someone reach out to me to ask about adding Rust extension support to rake-compiler-dock (and there are existing C++ extensions) so it feels like we should at least think about how we might support multiple toolchains.