Skip to content

Instantly share code, notes, and snippets.

@postmodern
Created July 9, 2022 01:38
Show Gist options
  • Select an option

  • Save postmodern/4c0cbccc0c7eda4585db0fc5267cdd57 to your computer and use it in GitHub Desktop.

Select an option

Save postmodern/4c0cbccc0c7eda4585db0fc5267cdd57 to your computer and use it in GitHub Desktop.
Draft of the upcoming RubyGems external_depenencies RFC

[RFC] Allow specifying and installing external dependencies

The Problem

Some gems require external dependencies, such as gems containing C extensions that compile against external C libraries (ex: pg) or gems that wrap around external command-line utilities (ex: graphviz). Presently there is no mechanism to automatically install these external dependencies. Instead, it is the user's responsibility to figure out the correct package name and install the external dependency via their system's package manager (ex: apt or brew) and attempt installing the gem again. This often results in users becoming frustrated when a C extension does not successfully build and then having to search google and StackOverflow for which package needs to be installed.

Very rarely do gem authors list external dependencies via the gemspec requirements attribute, and if so, it usually lists the canonical project name of the external dependency (ex: sqlite3) and not the package name (ex: libsqlite3-dev).

The Workaround

To workaround the problem of external dependencies some popular gems (ex: nokogiri) have begun vendoring their external dependencies directly into the gem and building both the C library and any C extensions during installation. There are two downsides to this approach:

  1. Increased compilation time.
  2. Security Advisories: each time a new security advisory is published for the vendored library, the gem maintainer has to update the vendored copy of the C library, publish an updated version of the gem, and then publish their own security advisory warning users that the previous versions of their gem contains a vulnerable copy of the vendored library.

The Proposal

It should be possible to list the external dependencies required by a gem and have rubygems automatically install them from the system's package manager during the installation process.

This external_dependencies metadata could be embedded in the gemspec's metadata attribute. In order to support naming differences between different package managers, the package names for the external dependencies would need to be listed for each popular package manager.

Example:

  gemspec.metadata['external_dependencies'] = {
    'apt'  => %w[libsqlite3-dev],
    'dnf'  => %w[sqlite-devel],
    'brew' => %w[sqlite],
    # ...
  }

Sub-Proposal 1

If only one external dependency needs to be specified, then the values of the external_dependencies Hash could also be single package name Strings which would later be automatically coerced into an Array:

Example:

  gemspec.metadata['external_dependencies'] = {
    'apt'  => 'libsqlite3-dev',
    'dnf'  => 'sqlite-devel',
    'brew' => 'sqlite',
    # ...
  }

However, this might be a bit confusing?

Sub-Proposal 2

If all of the package names are the same for each package manager, then external_dependencies could be specified as an Array of Strings:

Example:

  gemspec.metadata['external_dependencies'] = %w[nmap]

Caveats

Detecting the System's Package Manager

Multiple package managers may be installed on the same system. In order to determine the primary system package manager, the system's package manager can be selected based on the OS/distro/flavor.

macOS is a unique edge-case since it does not have a default package manager. So there should be a prioritized list of macOS package managers to check for in order of popularity:

  1. brew
  2. ports

Security Concerns

If gems can specify packages names that will be installed via an apt-get install or brew install command, special concern should be made to prevent arbitrary command injection. system() with multiple arguments (ex: system('apt-get','install',...) or Shellwords.shellescape must be used to prevent command injection.

In order to prevent option injection via the gemspec's package names, an argument of '--' or '-' can be specified before the package names to prematurely terminate option parsing and prevent the package names, so that they are not accidentally parsed as options.

Configurability

Some users may wish to customize which package manager is used on their system, if they have multiple package managers installed alongside each other. It may be necessary to add a configuration option or environment variable to control which package manager is used by default. Although, this feature request seems to be very rare.

Possible Deprecations

If we allow annotating the external dependencies and automatically installing them along with the gem, the gemspec's requirements attribute might no longer be necessary and could be deprecated in the future?

Previous Work

@flavorjones
Copy link

@kou Thank you for your response above.

I like "package dependency"

I like this!

I think that we request a new attribute for this instead of reusing metadata, requirements and so on

OK!

I think that package manager name as key isn't suitable.

Your explanation of this was very helpful, thank you. @postmodern are you OK with using a "platform ID" as the key?

(postmodern): we could create our own syntax to say "check for package A, otherwise install package B", and it could look like "mariadb|mysqlclient ..."
(kou) I agree to the "check for package A, otherwise install package B"

OK, I don't object to this.

I agree to detection by pkg-config command. But I suggest pkg-config instead of pkg for key.

I'm OK with this. @postmodern what do you think?

For toolchain ... I think that we don't need to install them automatically

OK, we can always consider this later if we need it.

Some packages need extra action to be installed ...

Your explanation here was very helpful, too. Would you be OK if an RFC was created and supporting these cases was discussed there?

Scenario 1 ... I didn't think that this was abuse

I like this approach! I don't think this is abuse, either.

Scenario 2 ... we can read .gemspec from extconf.rb if needed

Fair! OK.

Scenario 3.b ... it may be better that sqlite3-ruby is split to sqlite3-ruby and sqlcipher-ruby

Fair! sqlcipher support is a bit of a hack and should not delay this RFC.

...

@postmodern Do you feel OK with moving to an RFC at this point?

@flavorjones
Copy link

Clarifying one point above: for the runtime-only dependencies, the declaration in the gemspec would be sufficient for gem/bundler to install those dependencies. @ktou's point about using a Rakefile not abusing the C extension mechanism is orthogonal to the feature being proposed.

@postmodern
Copy link
Author

postmodern commented Sep 13, 2022

@kou

Scenario 1: runtime-only dependency

I'm using native-package-installer/pkg-config for this scenario with the Rakefile approach mentioned by @postmodern. (I didn't think that this is abuse...)

Eh, I respectfully disagree. Using gemspec.extensions with the Rakefile trick in order to install addition runtime-only dependencies, for ruby gems which do not contain C extensions, seems kind of like we are using gemspec.extensions and the ext-builder API within rubygems for something it was not originally designed for; hence why I call it "abuse". Perhaps, if we added a new gemspec.post_installation_script attribute to run an additional script, that might be a more appropriate of a place to add additional custom logic which executes during the installation process. However, there are obviously security concerns with adding another attribute which can execute arbitrary code during installation. This is why I more prefer static and declarative data in the gemspec, and have rubygems handle the logic; when possible of course.

I think there are actually two possible RFCs here which we could submit.

  1. Add a new gemspec attribute for declaring external (build or runtime) dependencies and logic in rubygems to check for and/or handle installing the declared dependencies. This is a far more ambitious RFC and contains all of the discussions about syntax and how to represent optional or alternative package names, etc.
  2. Add a new mkfmf method called install_packages (or something similar) so that C extensions with complex extconf.rb files, or C extensions which need to support being installed on older versions of rubygems, can explicitly install additional external build-time or runtime dependencies. This is definitely the more pragmatic and doable RFC. We already have many C extensions with complex extconf.rb files that use mkmf in order to check for the existence of pre-installed libraries, so adding additional install_packages method calls to them should not be that difficult.

Also, I am currently heads down trying to wrap up a very large refactor of an old project (only 62 remaining issues!), so I cannot estimate how much time I can allocate to the RFC process or development right now. :/

@postmodern
Copy link
Author

@flavorjones

I think that package manager name as key isn't suitable.

Your explanation of this was very helpful, thank you. @postmodern are you OK with using a "platform ID" as the key?

I would only suggest that we allow for a generic platform ID for all Debian-based distros or all RedHat-based distros, as there are many packages which have the same name between Debian and Ubuntu or RHEL and Fedora. This would reduce having to specify the same package name for every Debian or RedHat derivative platform.

I agree to detection by pkg-config command. But I suggest pkg-config instead of pkg for key.

I'm OK with this. @postmodern what do you think?

I'm OK with this. I am assuming the majority of package names match their pkg-config name, so this shouldn't be to difficult to guess or lookup.

For toolchain ... I think that we don't need to install them automatically

OK, we can always consider this later if we need it.

Agreed. If the ruby was compiled from source than a toolchain was already installed. If the ruby was installed from the package manager, then the package manager hopefully also installed the toolchain packages (this is the case with ruby-dev and ruby-devel packages).

Scenario 2 ... we can read .gemspec from extconf.rb if needed

What about the edge-case when there are multiple .gemspecs files in the unpacked gem source directory? Or would we query the rubygems API for the Gem::Specification of the gem which is being installed? Maybe this is something a install_package/install_packages mkmf method could handle?

Scenario 3.b ... it may be better that sqlite3-ruby is split to sqlite3-ruby and sqlcipher-ruby

Fair! sqlcipher support is a bit of a hack and should not delay this RFC.

I agree. It would be a lot of work to support defining either/or external dependencies and then detecting whether they are pre-installed or prompting the user to select which one they would wish to install. I feel like an extconf.rb file could better handle such complex logic. Creating separate ruby libraries for each C library backend is also an option.

@postmodern Do you feel OK with moving to an RFC at this point?

I am mostly OK with the proposal for adding a new gemspec attribute for installing build/runtime dependencies. Some of the syntax and constraints of what can actually go into the gemspec will probably be further discussed once we submit an RFC to rubygems. Although, I would like to see an example of the newly proposed syntax using platform ID and pkg-config names.

@byroot
Copy link

byroot commented Sep 13, 2022

Security Concerns

Should rubygems/bundler even attempt to install the package automatically though? On most systems that would require elevated permissions, which means a password prompt etc. And if you do a password prompt, you need to check whether you are in a TTY or not, otherwise you might hang forever etc.

IMHO just checking the package exists and giving a clear error message to the user with maybe the command to run should be enough, no?

@postmodern
Copy link
Author

@byroot this could be addressed by simply checking the return value of sudo and printing the appropriate error message, or checking $stdout.tty? and falling back to simply checking if the package is already installed and if not print an error message instructing to install the associated package name(s).

@kou
Copy link

kou commented Sep 13, 2022

Add a new mkfmf method called install_packages (or something similar) ...

This is the approach that pkg-config/native-package-installer does. If we're OK with the approach, we don't need to propose a new feature to RubyGems.

And if you do a password prompt, you need to check whether you are in a TTY or not, otherwise you might hang forever etc.

sudo is failed without a TTY. So we don't need to do anything for this case.

native-package-installer already has the feature https://github.com/ruby-gnome/native-package-installer/blob/master/lib/native-package-installer.rb#L60-L63 and it works (no hang, raises an exception instead) on CI.

giving a clear error message to the user with maybe the command to run

native-package-installer already has the feature https://github.com/ruby-gnome/native-package-installer/blob/master/lib/native-package-installer.rb#L83-L87 . It's showed only when sudo ... is failed.

@byroot
Copy link

byroot commented Sep 13, 2022

sudo is failed without a TTY.

It was one example among many. Docker container building hanging forever because apt is asking for a prompt is a common problem (generally solved with DEBIAN_FRONTEND=noninteractive). There is likely a very long tail of problem like this.

What I'm trying to get at, is that I suspect rubygems/bundler will likely consider this a huge can of worm they won't have any desire to support. Hence why I'm suggesting to include a scaled down proposal in case they don't want to go all the way.

But it's merely a suggestion really.

@kou
Copy link

kou commented Sep 14, 2022

There are many Dockerfiles that work without a TTY. So I think that there is no technical difficultly around TTY.

@voxik
Copy link

voxik commented Sep 14, 2022

Several random notes (and apologies if something was already mentioned and I missed it).

Dependency specification

What if the dependencies were specified completely differently then by package manager or distribution. What if the initial example looked like this:

gemspec.metadata['external_dependencies'] = {
    'sqlite'  => %w[libsqlite3-dev sqlite-devel sqlite],
    # ...
  }

Then the DNF implementation would be:

$ sudo dnf install libsqlite3-dev sqlite-devel sqlite --skip-broken
Last metadata expiration check: 1:27:14 ago on Wed Sep 14 15:17:09 2022.
No match for argument: libsqlite3-dev
Dependencies resolved.
================================================================================
 Package             Architecture  Version                 Repository      Size
================================================================================
Installing:
 sqlite-devel        x86_64        3.39.3-2.fc38           rawhide        143 k
Installing dependencies:
 sqlite              x86_64        3.39.3-2.fc38           rawhide        799 k

Transaction Summary
================================================================================
Install  2 Packages

Total download size: 942 k
Installed size: 2.2 M
Is this ok [y/N]: 
  1. I am not sure how other package managers would cope with this but I'd assume they could handle / ignore non existing dependencies
  2. Apparently, the sqlite package should not be installed 🤷‍♂️
  3. However, if the rubygems.org API provided a way to easily obtain this metadata, I think we could come with some way to help RubyGems to pick only the right dependencies, e.g. by using some virtual provides on Fedora / RHEL

Package managers vs distros

I don't think that there is clear winner. E.g. YUM or DNF might mean the same or not. Where also the package names on RHEL might or might not differ from package names on Fedora.

Don't add too much know how about package managers into RubyGems

I might be biased as a co-author of https://github.com/voxik/gem-nice-install, but I think that the plugin way was nice. If there is the XYZ distribution, their maintainers might provide plugin for their ABC packager. I don't think that RubyGems should know too much about any other package managers or distros

Just FTR, the gem-nice-install arguably implemented support for 3 package managers on Fedora:

https://github.com/voxik/gem-nice-install/blob/master/lib/rubygems/nice_install/fedora_ext_installer.rb

where the PackageKit way is prioritized, because it allows nice prompt for elevated user privileges. The YUM and DNF support was implemented, because at that time, YUM was the default and DNF was the new kid in the block.

@postmodern
Copy link
Author

@voxik you should probably read the previous comments. The originally suggestion was to group packages by package manager (ex: "dnf" => %w[...], "apt" => %w[...], ...). The new proposal is to group them by platform ID (ex: "rhel" or "fedora") and use a combination of pkg-config package names and package manager specific package names, which allows for both testing if the library was already installed and installing it via the system's package manager.

I think updated gemspec examples for the proposal would help clear up any confusion.

@voxik
Copy link

voxik commented Sep 14, 2022

My proposal was to avoid such groups. Not saying it is the best proposal, but something to consider. Because as I said, I'd suggest against doing assumptions.

BTW speaking of pkg-config, you can do dnf install "pkgconfig(sqlite3)" to get sqlite-devel installed on Fedora. These virtual provides should be reliable, because they are autogenerated.

@flavorjones
Copy link

Note for posterity: psych 5 no longer bundles libyaml, and the absence of a libyaml-dev distro package in many CI images is causing builds to fail. See ruby/setup-ruby#409 for background. A proposal like this, if it is adopted and used, could prevent breakage in similar scenarios in the future.

I DMed with @postmodern and I'm going to try to turn this into a real RFC in the next few weeks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment