Skip to content

Instantly share code, notes, and snippets.

@postmodern
Created July 9, 2022 01:38
Show Gist options
  • Select an option

  • Save postmodern/4c0cbccc0c7eda4585db0fc5267cdd57 to your computer and use it in GitHub Desktop.

Select an option

Save postmodern/4c0cbccc0c7eda4585db0fc5267cdd57 to your computer and use it in GitHub Desktop.
Draft of the upcoming RubyGems external_depenencies RFC

[RFC] Allow specifying and installing external dependencies

The Problem

Some gems require external dependencies, such as gems containing C extensions that compile against external C libraries (ex: pg) or gems that wrap around external command-line utilities (ex: graphviz). Presently there is no mechanism to automatically install these external dependencies. Instead, it is the user's responsibility to figure out the correct package name and install the external dependency via their system's package manager (ex: apt or brew) and attempt installing the gem again. This often results in users becoming frustrated when a C extension does not successfully build and then having to search google and StackOverflow for which package needs to be installed.

Very rarely do gem authors list external dependencies via the gemspec requirements attribute, and if so, it usually lists the canonical project name of the external dependency (ex: sqlite3) and not the package name (ex: libsqlite3-dev).

The Workaround

To workaround the problem of external dependencies some popular gems (ex: nokogiri) have begun vendoring their external dependencies directly into the gem and building both the C library and any C extensions during installation. There are two downsides to this approach:

  1. Increased compilation time.
  2. Security Advisories: each time a new security advisory is published for the vendored library, the gem maintainer has to update the vendored copy of the C library, publish an updated version of the gem, and then publish their own security advisory warning users that the previous versions of their gem contains a vulnerable copy of the vendored library.

The Proposal

It should be possible to list the external dependencies required by a gem and have rubygems automatically install them from the system's package manager during the installation process.

This external_dependencies metadata could be embedded in the gemspec's metadata attribute. In order to support naming differences between different package managers, the package names for the external dependencies would need to be listed for each popular package manager.

Example:

  gemspec.metadata['external_dependencies'] = {
    'apt'  => %w[libsqlite3-dev],
    'dnf'  => %w[sqlite-devel],
    'brew' => %w[sqlite],
    # ...
  }

Sub-Proposal 1

If only one external dependency needs to be specified, then the values of the external_dependencies Hash could also be single package name Strings which would later be automatically coerced into an Array:

Example:

  gemspec.metadata['external_dependencies'] = {
    'apt'  => 'libsqlite3-dev',
    'dnf'  => 'sqlite-devel',
    'brew' => 'sqlite',
    # ...
  }

However, this might be a bit confusing?

Sub-Proposal 2

If all of the package names are the same for each package manager, then external_dependencies could be specified as an Array of Strings:

Example:

  gemspec.metadata['external_dependencies'] = %w[nmap]

Caveats

Detecting the System's Package Manager

Multiple package managers may be installed on the same system. In order to determine the primary system package manager, the system's package manager can be selected based on the OS/distro/flavor.

macOS is a unique edge-case since it does not have a default package manager. So there should be a prioritized list of macOS package managers to check for in order of popularity:

  1. brew
  2. ports

Security Concerns

If gems can specify packages names that will be installed via an apt-get install or brew install command, special concern should be made to prevent arbitrary command injection. system() with multiple arguments (ex: system('apt-get','install',...) or Shellwords.shellescape must be used to prevent command injection.

In order to prevent option injection via the gemspec's package names, an argument of '--' or '-' can be specified before the package names to prematurely terminate option parsing and prevent the package names, so that they are not accidentally parsed as options.

Configurability

Some users may wish to customize which package manager is used on their system, if they have multiple package managers installed alongside each other. It may be necessary to add a configuration option or environment variable to control which package manager is used by default. Although, this feature request seems to be very rare.

Possible Deprecations

If we allow annotating the external dependencies and automatically installing them along with the gem, the gemspec's requirements attribute might no longer be necessary and could be deprecated in the future?

Previous Work

@byroot
Copy link

byroot commented Sep 13, 2022

Security Concerns

Should rubygems/bundler even attempt to install the package automatically though? On most systems that would require elevated permissions, which means a password prompt etc. And if you do a password prompt, you need to check whether you are in a TTY or not, otherwise you might hang forever etc.

IMHO just checking the package exists and giving a clear error message to the user with maybe the command to run should be enough, no?

@postmodern
Copy link
Author

@byroot this could be addressed by simply checking the return value of sudo and printing the appropriate error message, or checking $stdout.tty? and falling back to simply checking if the package is already installed and if not print an error message instructing to install the associated package name(s).

@kou
Copy link

kou commented Sep 13, 2022

Add a new mkfmf method called install_packages (or something similar) ...

This is the approach that pkg-config/native-package-installer does. If we're OK with the approach, we don't need to propose a new feature to RubyGems.

And if you do a password prompt, you need to check whether you are in a TTY or not, otherwise you might hang forever etc.

sudo is failed without a TTY. So we don't need to do anything for this case.

native-package-installer already has the feature https://github.com/ruby-gnome/native-package-installer/blob/master/lib/native-package-installer.rb#L60-L63 and it works (no hang, raises an exception instead) on CI.

giving a clear error message to the user with maybe the command to run

native-package-installer already has the feature https://github.com/ruby-gnome/native-package-installer/blob/master/lib/native-package-installer.rb#L83-L87 . It's showed only when sudo ... is failed.

@byroot
Copy link

byroot commented Sep 13, 2022

sudo is failed without a TTY.

It was one example among many. Docker container building hanging forever because apt is asking for a prompt is a common problem (generally solved with DEBIAN_FRONTEND=noninteractive). There is likely a very long tail of problem like this.

What I'm trying to get at, is that I suspect rubygems/bundler will likely consider this a huge can of worm they won't have any desire to support. Hence why I'm suggesting to include a scaled down proposal in case they don't want to go all the way.

But it's merely a suggestion really.

@kou
Copy link

kou commented Sep 14, 2022

There are many Dockerfiles that work without a TTY. So I think that there is no technical difficultly around TTY.

@voxik
Copy link

voxik commented Sep 14, 2022

Several random notes (and apologies if something was already mentioned and I missed it).

Dependency specification

What if the dependencies were specified completely differently then by package manager or distribution. What if the initial example looked like this:

gemspec.metadata['external_dependencies'] = {
    'sqlite'  => %w[libsqlite3-dev sqlite-devel sqlite],
    # ...
  }

Then the DNF implementation would be:

$ sudo dnf install libsqlite3-dev sqlite-devel sqlite --skip-broken
Last metadata expiration check: 1:27:14 ago on Wed Sep 14 15:17:09 2022.
No match for argument: libsqlite3-dev
Dependencies resolved.
================================================================================
 Package             Architecture  Version                 Repository      Size
================================================================================
Installing:
 sqlite-devel        x86_64        3.39.3-2.fc38           rawhide        143 k
Installing dependencies:
 sqlite              x86_64        3.39.3-2.fc38           rawhide        799 k

Transaction Summary
================================================================================
Install  2 Packages

Total download size: 942 k
Installed size: 2.2 M
Is this ok [y/N]: 
  1. I am not sure how other package managers would cope with this but I'd assume they could handle / ignore non existing dependencies
  2. Apparently, the sqlite package should not be installed 🤷‍♂️
  3. However, if the rubygems.org API provided a way to easily obtain this metadata, I think we could come with some way to help RubyGems to pick only the right dependencies, e.g. by using some virtual provides on Fedora / RHEL

Package managers vs distros

I don't think that there is clear winner. E.g. YUM or DNF might mean the same or not. Where also the package names on RHEL might or might not differ from package names on Fedora.

Don't add too much know how about package managers into RubyGems

I might be biased as a co-author of https://github.com/voxik/gem-nice-install, but I think that the plugin way was nice. If there is the XYZ distribution, their maintainers might provide plugin for their ABC packager. I don't think that RubyGems should know too much about any other package managers or distros

Just FTR, the gem-nice-install arguably implemented support for 3 package managers on Fedora:

https://github.com/voxik/gem-nice-install/blob/master/lib/rubygems/nice_install/fedora_ext_installer.rb

where the PackageKit way is prioritized, because it allows nice prompt for elevated user privileges. The YUM and DNF support was implemented, because at that time, YUM was the default and DNF was the new kid in the block.

@postmodern
Copy link
Author

@voxik you should probably read the previous comments. The originally suggestion was to group packages by package manager (ex: "dnf" => %w[...], "apt" => %w[...], ...). The new proposal is to group them by platform ID (ex: "rhel" or "fedora") and use a combination of pkg-config package names and package manager specific package names, which allows for both testing if the library was already installed and installing it via the system's package manager.

I think updated gemspec examples for the proposal would help clear up any confusion.

@voxik
Copy link

voxik commented Sep 14, 2022

My proposal was to avoid such groups. Not saying it is the best proposal, but something to consider. Because as I said, I'd suggest against doing assumptions.

BTW speaking of pkg-config, you can do dnf install "pkgconfig(sqlite3)" to get sqlite-devel installed on Fedora. These virtual provides should be reliable, because they are autogenerated.

@flavorjones
Copy link

Note for posterity: psych 5 no longer bundles libyaml, and the absence of a libyaml-dev distro package in many CI images is causing builds to fail. See ruby/setup-ruby#409 for background. A proposal like this, if it is adopted and used, could prevent breakage in similar scenarios in the future.

I DMed with @postmodern and I'm going to try to turn this into a real RFC in the next few weeks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment