Skip to content

Instantly share code, notes, and snippets.

@postmodern
Created July 9, 2022 01:38
Show Gist options
  • Select an option

  • Save postmodern/4c0cbccc0c7eda4585db0fc5267cdd57 to your computer and use it in GitHub Desktop.

Select an option

Save postmodern/4c0cbccc0c7eda4585db0fc5267cdd57 to your computer and use it in GitHub Desktop.
Draft of the upcoming RubyGems external_depenencies RFC

[RFC] Allow specifying and installing external dependencies

The Problem

Some gems require external dependencies, such as gems containing C extensions that compile against external C libraries (ex: pg) or gems that wrap around external command-line utilities (ex: graphviz). Presently there is no mechanism to automatically install these external dependencies. Instead, it is the user's responsibility to figure out the correct package name and install the external dependency via their system's package manager (ex: apt or brew) and attempt installing the gem again. This often results in users becoming frustrated when a C extension does not successfully build and then having to search google and StackOverflow for which package needs to be installed.

Very rarely do gem authors list external dependencies via the gemspec requirements attribute, and if so, it usually lists the canonical project name of the external dependency (ex: sqlite3) and not the package name (ex: libsqlite3-dev).

The Workaround

To workaround the problem of external dependencies some popular gems (ex: nokogiri) have begun vendoring their external dependencies directly into the gem and building both the C library and any C extensions during installation. There are two downsides to this approach:

  1. Increased compilation time.
  2. Security Advisories: each time a new security advisory is published for the vendored library, the gem maintainer has to update the vendored copy of the C library, publish an updated version of the gem, and then publish their own security advisory warning users that the previous versions of their gem contains a vulnerable copy of the vendored library.

The Proposal

It should be possible to list the external dependencies required by a gem and have rubygems automatically install them from the system's package manager during the installation process.

This external_dependencies metadata could be embedded in the gemspec's metadata attribute. In order to support naming differences between different package managers, the package names for the external dependencies would need to be listed for each popular package manager.

Example:

  gemspec.metadata['external_dependencies'] = {
    'apt'  => %w[libsqlite3-dev],
    'dnf'  => %w[sqlite-devel],
    'brew' => %w[sqlite],
    # ...
  }

Sub-Proposal 1

If only one external dependency needs to be specified, then the values of the external_dependencies Hash could also be single package name Strings which would later be automatically coerced into an Array:

Example:

  gemspec.metadata['external_dependencies'] = {
    'apt'  => 'libsqlite3-dev',
    'dnf'  => 'sqlite-devel',
    'brew' => 'sqlite',
    # ...
  }

However, this might be a bit confusing?

Sub-Proposal 2

If all of the package names are the same for each package manager, then external_dependencies could be specified as an Array of Strings:

Example:

  gemspec.metadata['external_dependencies'] = %w[nmap]

Caveats

Detecting the System's Package Manager

Multiple package managers may be installed on the same system. In order to determine the primary system package manager, the system's package manager can be selected based on the OS/distro/flavor.

macOS is a unique edge-case since it does not have a default package manager. So there should be a prioritized list of macOS package managers to check for in order of popularity:

  1. brew
  2. ports

Security Concerns

If gems can specify packages names that will be installed via an apt-get install or brew install command, special concern should be made to prevent arbitrary command injection. system() with multiple arguments (ex: system('apt-get','install',...) or Shellwords.shellescape must be used to prevent command injection.

In order to prevent option injection via the gemspec's package names, an argument of '--' or '-' can be specified before the package names to prematurely terminate option parsing and prevent the package names, so that they are not accidentally parsed as options.

Configurability

Some users may wish to customize which package manager is used on their system, if they have multiple package managers installed alongside each other. It may be necessary to add a configuration option or environment variable to control which package manager is used by default. Although, this feature request seems to be very rare.

Possible Deprecations

If we allow annotating the external dependencies and automatically installing them along with the gem, the gemspec's requirements attribute might no longer be necessary and could be deprecated in the future?

Previous Work

@flavorjones
Copy link

Thanks for drafting this! I'm not sure where you'd like comments, so I'll put some thoughts here. I'd be happy to write these up somewhere more permanent if you like.

Naming! Of course. And I apologize.

"External dependency" is what's used in this proposal for the same concept I call "system dependency" in https://github.com/flavorjones/ruby-c-extensions-explained and what native-package-installer calls a "native package". I don't feel strongly about naming, but "external dependency" isn't familiar to me and feels imprecise (aren't all dependencies external?). I'd like to encourage the use of a term that's already in use.

But I don't feel too strongly and will use "external dependency" in these comments.

Gem::Specification#metadata

All the proposals above use Hash data stored in Gem::Specification#metadata, but unfortunately metadata values must be Strings, which makes this attribute a bit challenging to use.

Perhaps this proposal could specify that #metadata should start allowing Hash values, or introduce a new attribute to hold external dependency information. But either way means challenges around backwards-compatibility with older versions of RubyGems.

Another option could be to borrow the approach from JRuby's jar-dependencies project (which defined Maven dependencies in the gemspec) which appends to Gem::Specification#requirements array, which is an Array[String]. (I personally like #requirements a bit better because this seems to be exactly the intention is behind the attribute!)

Detecting when the dependency is already installed

I think we need to provide a way to detect if the package has already been installed that handles both package-manager-managed packages as well as manually-compiled-and-installed libraries. I'd like to propose that we adopt the pkg-config name for the package (which perhaps naively presumes one exists).

The pkg-config names are consistent across platforms, this is what pkg-config is intended to do, and it seems like most distros come with pkg-config installed (we may want to verify this, but it's true for the main linux distros, for homebrew, and for msys2/mingw (I think)). Users can then set PKG_CONFIG_PATH to find manually-installed libraries in custom directories.

So a configuration might look something like (and I'm using #requirements and a JSON string for reasons explained above):

spec.requirements << <<~JSON
  {
    "pkg": "libxml-2.0",
    "apt": "libxml2-dev", # debian family
    "apk": "libxml2-dev", # alpine
    "pacman": "libxml2", # msys2
    "dnf": "libxml2-devel", # centos, redhat
    "homebrew": "libxml2" # macos, or, you know, whenever homebrew is installed
  }
JSON

and at gem install time, RubyGems would do the following "presence check":

  1. if "pkg" key is present, use pkg-config to determine if a package named libxml-2.0 is present on the system
  2. if not, query all package managers preset on the system for the named package(s)

Handling multiple dependencies

If you agree with the previous section, then when there are multiple dependencies we will need multiple dependency definitions. This, I think, makes #requirements a better choice than #metadata, so that we can do something like:

spec.requirements << <<~JSON
  {
    "pkg": "libxml-2.0",
    "apt": "libxml2-dev",
    "apk": "libxml2-dev",
    "pacman": "libxml2",
    "dnf": "libxml2-devel",
    "homebrew": "libxml2"
  }
JSON

spec.requirements << <<~JSON
  {
    "pkg": "libxslt",
    "apt": "libxslt1-dev",
    "apk": "libxslt-dev",
    "pacman": "libxslt",
    "dnf": "libxslt-devel",
    "homebrew": "libxslt"
  }
JSON

Detecting which elements of #requirements are external dependencies will probably require a "syntax" of sorts, like used by jar-dependencies, but maybe that's something simple like the characters "pkg " at the start.

Opt-in

I think the installation of external dependencies should require the user to opt-in to the behavior. Perhaps the detection can always happen, but a sane failure message can be displayed telling the user how to opt-in (like a --install-external-dependencies flag on gem install or bundle install).

Multiple packages might meet the requirements

There are going to be situations where multiple packages might meet the gem's requirements, for example mysqlite2 which builds with either libmysqlclient or libmariadb. How can we handle this? Maybe something like:

spec.requirements << <<~JSON
  {
    "pkg": ["mariadb", "mysqlclient"],
    "apt": "libmariadb", # gem author has to pick at most one
    ...
  }
JSON

which would use pkg-config to check for the presence of either mariadb or mysqlclient before proceeding to the installation phase.

Another example of this is the sqlite3 gem which can be satisfied by either sqlite or sqlcipher.

Toolchain

I think we should also give some thought to allowing a gem with an extension to define the toolchain necessary to build it, too. (Note that we have an entire section in the Nokogiri manual dedicated to this topic.)

For C extensions, maybe it's something like:

spec.requirements << "toolchain: cc"

and the "cc" toolchain might reasonably be implemented under the hood as a set of external dependencies like this:

[
  { "apt": "gcc", ... },
  { "apt": "make", ... },
  { "apt": "patch", ... }
]

This suggestion feels pretty squishy, but I recently had someone reach out to me to ask about adding Rust extension support to rake-compiler-dock (and there are existing C++ extensions) so it feels like we should at least think about how we might support multiple toolchains.

@postmodern
Copy link
Author

@flavorjones I should have probably put this into a Google Doc to allow for better commenting...

Naming

OK I can see how external_dependency might be confusing, considering that other gem dependencies are technically "external". However, "external" here is trying to imply that the dependency is outside of the RubyGems ecosystem. Some other possible names:

  • system_dependencies
  • package_depenencies
  • packages
  • system_packages

metadata

Good catch. A quick workaround would be to just add the package lists as space separated Strings to metadata:

gemspec.metadata['apt'] = "libfoo-devel bar baz"
gemspec.metadata['dnf'] = "libfoo2-dev bar baz"
gemspec.metadata['brew'] = "foo bar baz"

Alternatively, we could call the metadata keys apt_pkgs or something similar?

I would argue against extending gemspec.requirements as it's already an arbitrary String, so there's no real way to validate it which also makes parsing and extracting the package metadata from it error prone. Also embedding JSON into a Ruby String, which then later gets converted into YAML, seems kind of hacky.

Ideally, we should be able to just add new attributes to Gem::Specificiation, but that wasn't thought of when the .gem package format would originally designed. I think allowing nested Hashes within metadata is the second best thing, assuming RubyGems can figure out how to recursively validate metadata and catch infinite referential Hashes.

Determining Already Installed Packages

Good news. Most package managers already ignore previously installed packages. The exceptions are brew and pacman (Arch). Luckily, ruby-install already includes some code to filter out previously installed packages when using brew or pacman.

Opt-In

This is a good idea, especially considering it's a new feature, and we probably will need to beta-test it before turning it on for everyone. Although, it should be a one-time opt-in, possibly done via ~/.gemrc or some command that adds a flag file to ~/.gem/.

Multiple Optional Dependencies

Eh, this gets complex. I believe both Debian (deb) and RedHat (rpm) packages have generic meta package names which will resolve to which ever package is installed? This allows other packages to depend on either mariadb or mysqlclient, but that forces the user to decide which one to install first. I would error on the side of caution and only support hard-dependencies first, that way no user interaction is required during installation.

Toolchains

Gems could define explicit package dependencies on gcc or clang, or simply use the generic package group of build-essentials (apt) or C Development Tools and Libraries (dnf). No need to re-invent package meta-groups as package managers already provide us with them.

@postmodern
Copy link
Author

There are going to be situations where multiple packages might meet the gem's requirements, for example mysqlite2 which builds with either libmysqlclient or libmariadb. How can we handle this? Maybe something like:

Debian's Virtual dependencies was what I was looking for.

@flavorjones
Copy link

I've been thinking about this quite a bit over the last few days, and I'm less excited by a declarative gemspec approach now having played out a few scenarios in my head.

Circling back on detection

Specifically I want to point out that I didn't communicate clearly in my first comment about "detecting already-installed packages". What I meant by that is manually installed libraries, not previously-installed-by-package-manager packages.

For example, imagine a user who is insanely concerned with performance and so has compiled their own libxml2 with some compiler options like -march and -O3; and has set the env var PKG_CONFIG_PATH so that the pkg-config file libxml-2.0.pc can be found.

Or consider a Mac user who's already got a version installed through macports, and won't want to install another via homebrew.

This is part of the reason I suggested the syntax in my previous comment -- wrapping a set of declarations with a pkg-config name. Does that suggestion make more sense now?

I think any solution needs to be able to handle situations like this -- it needs to use pkg-config and not blindly rely on the package managers as the first and only option. Do you agree?

Scenarios

I like to go through concrete scenarios as a thought experiment for new APIs.

Scenario 1: runtime-only depenendency: ffi-libarchive

The proposed solution seems perfect for a gem that has no Gem::Specification#extensions defined but has a runtime dependency on a system library.

For example, something like ffi-libarchive which uses FFI to bind to libarchive.

spec.metadata['apt-package'] = "libarchive13"
spec.metadata['homebrew-package'] = "libarchive"
# ...

and then the gem at runtime calls (file):

extend FFI::Library
ffi_lib %w{libarchive.so.13 libarchive.13 libarchive-13 libarchive.so libarchive archive}

Scenario 2: Simple C extension: rcairo

I can imagine a solution like this being valid for straightforward integrations that are insensitive to version and don't have many install-time options.

For example, something like rcairo which currently uses native-package-installer (extconf.rb):

unless required_pkg_config_package([package, major, minor, micro],
                                   :arch_linux => "cairo",
                                   :debian => "libcairo2-dev",
                                   :homebrew => "cairo",
                                   :macports => "cairo",
                                   :msys2 => "cairo",
                                   :redhat => "cairo-devel")
  exit(false)
end

(The code above is taken from the extconf.rb, and does both a detection and an installation phase for the libcairo2 library.)

Under the proposed solution, the dependencies would be declared as follows in the gemspec:

spec.metadata['apt-package'] = "libcairo2-dev"
spec.metadata['pacman-package'] = "cairo"
spec.metadata['homebrew-package'] = "cairo"
# ...

In the extconf.rb the "native package installation" code can be deleted, however the pkg-config/detection bit must remain so the extension knows the compiler and linker flags (see https://github.com/rcairo/rcairo/blob/master/ext/cairo/extconf.rb#L42), which means it's not a big advantage to use the declarative gemspec syntax. And there's some risk that the detection done in the extconf doesn't match the detection done by RubyGems -- I worry that we'd be duplicating that logic.

I'm not sure I like the separation of the dependency declaration from the code where those dependencies are used. Previously the extconf.rb was the canonical place to look, but we'd be placing the names into the gemspec, and the logic around detection is in two places (RubyGems and extconf.rb).

This scenario would be supported, but it doesn't feel like a clear win to me.

Scenario 3.a: C extension with "use whichever" optionality

Let's look at the mysql2 gem which is pretty straightforward except for the optionality around the client library (supports mariadb and mysqlclient).

I'm not sure how to express this optionality with the proposed solution. The extconf.rb somewhat relies on a system only having one or the other installed -- so if it finds either, it uses it. How do we express this detection logic in the gemspec? Which one should be installed if no match is found? Are you suggesting just listing both?

My earlier suggestion of listing multiple pkg-config names might help, but I still worry about complex detection being done in both RubyGems and in the extconf.rb.

You mentioned generic package names, but to me this means we're taking something that can be expressed easily in Ruby in an extconf.rb and moving that responsibility into the distro package manager (assuming it supports the functionality and a suitable virtual package) which feels less obvious and more brittle.

Scenario 3.b: C extension with "choose one at install time" optionality

A gem with a different flavor of optionality is sqlite3-ruby which can use either libsqlite or libsqlcipher; and the difference here from mysql2 is that both libsqlite and libsqlcipher can be installed on the system at the same time.

The extension defaults to look for (and use) libsqlite, but with a runtime option --with-sqlcipher, the gem installation will instead use libsqlcipher.

I don't think we're able to implement an extension like this with a declarative gemspec syntax. Only at install-time is it possible to know which of the two libraries the user wants to use.

Maybe there's a cheaper alternative

In short, it feels like declaring dependencies in the gemspec isn't the silver bullet I originally imagined it would be. It also feels complex enough that getting it right for the use cases where it's a good solution might take some iteration, and RubyGems feels like a heavyweight solution that will be challenging to iterate within (but maybe I'm wrong). Finally, the detection phase still needs to be duplicated in both RubyGems and in the extconf.rb.

Instead of a declarative syntax in the gemspec, though, what if we provided a great set of tools for C extension authors to use in their extconf.rb scripts? The combination of native-package-installer and the pkg-config gem feels like a good start, though I'd want to add:

  • actionable failure messages that are easy to understand
  • standard commandline options to opt-in to package installation

Why not use these today?

I think we could! -- in fact as part of a recent overhaul of the sqlite3-gem I had an experimental branch that used native-package-installer and it mostly worked; but we instead decided to go with a precompiled library.

One obstacle to using native-package-installer and pkg-config (the gem) is that they're additional gem dependencies which aren't actually needed at runtime (only at install-time). Further, in the past people have objected to introducing gem dependencies which are LGPL-licensed -- specifically the pkg-config gem and native-package-installer are both released under LGPL.

(To work around that objection, we might propose a change to RubyGems that introduces a new "install-time dependency" in addition to the current "runtime dependency" and "development dependency", which would allow folks to delete install-time dependencies once the gem is installed, or to ignore the licenses of install-time dependencies. Would love to hear @kou's thoughts about that.)

We can start experimenting with this approach today, without having to first get approval from RubyGems maintainers.

If this approach works well, we could later propose moving the code into Ruby itself as companions to MakeMakefile for C extensions.

@postmodern
Copy link
Author

postmodern commented Jul 11, 2022

I think it might help to define the scope here. We should focus only on installing the minimal required runtime or build-time dependencies. Issues related to detection and linking against installed libraries should be left to extconf, mkmf, pkg-config, etc.

For example, since optional dependencies keep being mentioned, we could create our own syntax to say "check for package A, otherwise install package B", and it could look like "mariadb|mysqlclient ...". However, just because mariadb is installed does that really mean the user wants the mysql gem to compile it's C extensions against mariadb instead of libmysqlclient? Perhaps the user installed mariadb for some other project and forgot about it?

I would argue that this dilemma should be explicitly decided by the user via either an ENV variable that they pass in or some kind of special --with-* or --use-* option flag that's passed to gem install. If they do not specify which library they would prefer to compile against, then the gem's default package dependencies should be installed and the extconf script should find the desired library. As a general rule of thumb, every gem should have reliable defaults that it can fallback on so that it can be installed without requiring explicit user interaction. Maybe a --no-install-deps option flag could be added to gem install to allow users to bypass the default dependencies in case they really do not want the external library installed from the system's package manager?

Only issue with native-package-installer/pkg-config is they are not really suitable for non-C-extension gems that also require runtime dependencies, such as FFI bindings and command wrapper gems (ex: graphviz). One could check for the presence of a library when the gem is loaded, and then optionally install it if it's missing, but this would slow down load-time and possibly cause sudo commands to be accidentally invoked in production. One could add an empty extconf.rb file to use native-package-installer/pkg-config only to install the necessary dependencies, however rubygems code expects a C extension to be built if gemspec.extensions contains ext/extconf.rb; however, I discovered this is not the case if gemspec.extensions contains a path to a Rakefile. Although, it seems weird to have to (ab)use rubygems C extension build system simply to install a known dependency from the package manager. Seems like native-package-installer/pkg-config are better suited for C extensions, especially those with complex dependency requirements.

@kou
Copy link

kou commented Jul 11, 2022

(It seems that we continue using here. I'll comment on here later. Please wait for a while...)

@kou
Copy link

kou commented Jul 14, 2022

Naming

I like "package dependency". Because this feature just installs one or more packages. This feature doesn't install a dependency by building it.

How to specify metadata

I think that we request a new attribute for this instead of reusing metadata, requirements and so on. Anyway, we can discuss this with RubyGems developers after we create a issue for this on rubygems/rubygems. I think that we can focus on what metadata should we need for this feature.

What metadata are needed

I think that the following metadata are needed:

  • Platform information
  • Package name(s) for each platform
  • How to detect already installed packages
  • Extra information to install packages
    • Package repository name
    • How to add external package repository

Here are some explains:

Package name key

I think that package manager name as key isn't suitable.

For example, dnf is used by Fedora, Red Hat Enterprise Linux and Red Hat Enterprise Linux rebuild distributions (AlmaLinux, Rocky Linux and so on). But Fedora and Red Hat Enterprise Linux have many differences. For example, Fedora has gtk4-devel but AlmaLinux 8 doesn't have it.

Another example, ALT Linux uses apt-get but package names are different with Debian GNU/Linux. For example, ALT Linux uses libgtk+3-devel for GTK 3 but Debian GNU/Linux uses libgtk-3-dev for GTK 3.

So I think that platform ID (rhel, redhat or something for Red Hat Enterprise Linux and its rebuilds) is better for key. This is why native-package-installer uses platform ID as key. We may need to accept platform version as optional value such as rhel and rhel-8.

(We may resolve the above case by the "one of packages" feature described in mysql2 gem case. One of libmysqlclient and libmariadb is required in the case.)

How to specify "one of packages"

I agree to the "check for package A, otherwise install package B" with users can specify which is used explicitly approach.

How to detect already installed packages

I agree to detection by pkg-config command. But I suggest pkg-config instead of pkg for key. Because we may add more information to detect installed packages later. For example, we may use library name such as xml that finds libxml.so on Linux and libxml.dylib on macOS.

For toolchain and pkg-config command, I think that we don't need to install them automatically. I think that it's already installed by ruby-dev/ruby-devel package or RVM/rbenv. If pkg-config command isn't installed, we can skip installed packages check by pkg-config command. Or we can add a new RubyGems option to install pkg-config command automatically if it doesn't exist.

How to install package

Some packages need extra action to be installed.

For example, we need to enable powertools repository to install snappy-devel on AlmaLinux 8:

dnf install -y --enaberepo=powertools snappy-devel

We need to enable crb repository to install snappy-devel on AlmaLinux 9:

dnf install -y --enablerepo=crb snappy-devel

Another example, we need to install epel-release to install re2-devel on AlmaLinux 8 and AlmaLinux 9:

dnf install -y epel-release
dnf install -y re2-devel

But epel-release package doesn't exist in Red Hat Enterprise Linux 8. We need to install it from https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm :

dnf install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm
dnf install -y re2-devel

Amazon Linux 2 (based on Red Hat Enterprise Linux 7) uses another command line:

amazon-linux-extras install -y epel
yum install -y re2-devel

Ubuntu has PPA (Personal Package Archive) mechanism: https://help.launchpad.net/Packaging/PPA

Users can add an external package repository by add-apt-repository:

apt -y install software-properties-common
add-apt-repository -y ppa:groonga/ppa
apt install -y libgroonga-dev

As a personal request, I want a feature that registers external APT/Yum repository by .deb/.rpm URL. e.g.:

Because this is one of my use cases.

Scenarios

Scenario 1: runtime-only dependency

I'm using native-package-installer/pkg-config for this scenario with the Rakefile approach mentioned by @postmodern. (I didn't think that this is abuse...)

For example, red-parquet:

Scenario 2: Simple C extension

I'll add metadata to .gemspec and keep the auto package install feature in extconf.rb until all RubyGems that don't support the feature reach EOL. But I'm happy with the situation. Because I can remove the auto package install feature later.

I'm OK with putting package information to .gemspec and extconf.rb. I understand that we don't want to spread the information to multiple places. But we can read .gemspec from extconf.rb and refer the information in .gemspec in extconf.rb if needed.

Scenario 3.a: C extension with "use whichever" optionality

(See my comment in 'How to specify "one of packages"'.)

Scenario 3.b: C extension with "choose one at install time" optionality

Hmm, it may be better that sqlite3-ruby is split to sqlite3-ruby and sqlcipher-ruby. Then users can install libsqlite3 bindings and libsqlcipher bindings at once.

LGPL

As I mentioned sparklemotion/nokogiri#1488 (comment) , people who objected misunderstand LGPL.

In addition, in the Nokogiri use case, extconf.rb is only "Application" and "Combined Work" in LGPLv3. lib/**/*.rb and nokogiri.so are neither "Application" nor "Combined Work" in LGPLv3. So applications which use only lib/**/*.rb and nokogiri.so doesn't related to LGPLv3.

Anyway... "install-time dependency" will satisfy people who objected native-package-installer/pkg-config use.

Alternative: RubyGems plugin

We may be able to implement this feature by RubyGems' plugin feature: https://guides.rubygems.org/plugins/

For example, https://github.com/voxik/gem-nice-install has similar feature.

But this requires users to install a RubyGems plugin explicitly before users install the target gem.

@flavorjones
Copy link

@kou Thank you for your response above.

I like "package dependency"

I like this!

I think that we request a new attribute for this instead of reusing metadata, requirements and so on

OK!

I think that package manager name as key isn't suitable.

Your explanation of this was very helpful, thank you. @postmodern are you OK with using a "platform ID" as the key?

(postmodern): we could create our own syntax to say "check for package A, otherwise install package B", and it could look like "mariadb|mysqlclient ..."
(kou) I agree to the "check for package A, otherwise install package B"

OK, I don't object to this.

I agree to detection by pkg-config command. But I suggest pkg-config instead of pkg for key.

I'm OK with this. @postmodern what do you think?

For toolchain ... I think that we don't need to install them automatically

OK, we can always consider this later if we need it.

Some packages need extra action to be installed ...

Your explanation here was very helpful, too. Would you be OK if an RFC was created and supporting these cases was discussed there?

Scenario 1 ... I didn't think that this was abuse

I like this approach! I don't think this is abuse, either.

Scenario 2 ... we can read .gemspec from extconf.rb if needed

Fair! OK.

Scenario 3.b ... it may be better that sqlite3-ruby is split to sqlite3-ruby and sqlcipher-ruby

Fair! sqlcipher support is a bit of a hack and should not delay this RFC.

...

@postmodern Do you feel OK with moving to an RFC at this point?

@flavorjones
Copy link

Clarifying one point above: for the runtime-only dependencies, the declaration in the gemspec would be sufficient for gem/bundler to install those dependencies. @ktou's point about using a Rakefile not abusing the C extension mechanism is orthogonal to the feature being proposed.

@postmodern
Copy link
Author

postmodern commented Sep 13, 2022

@kou

Scenario 1: runtime-only dependency

I'm using native-package-installer/pkg-config for this scenario with the Rakefile approach mentioned by @postmodern. (I didn't think that this is abuse...)

Eh, I respectfully disagree. Using gemspec.extensions with the Rakefile trick in order to install addition runtime-only dependencies, for ruby gems which do not contain C extensions, seems kind of like we are using gemspec.extensions and the ext-builder API within rubygems for something it was not originally designed for; hence why I call it "abuse". Perhaps, if we added a new gemspec.post_installation_script attribute to run an additional script, that might be a more appropriate of a place to add additional custom logic which executes during the installation process. However, there are obviously security concerns with adding another attribute which can execute arbitrary code during installation. This is why I more prefer static and declarative data in the gemspec, and have rubygems handle the logic; when possible of course.

I think there are actually two possible RFCs here which we could submit.

  1. Add a new gemspec attribute for declaring external (build or runtime) dependencies and logic in rubygems to check for and/or handle installing the declared dependencies. This is a far more ambitious RFC and contains all of the discussions about syntax and how to represent optional or alternative package names, etc.
  2. Add a new mkfmf method called install_packages (or something similar) so that C extensions with complex extconf.rb files, or C extensions which need to support being installed on older versions of rubygems, can explicitly install additional external build-time or runtime dependencies. This is definitely the more pragmatic and doable RFC. We already have many C extensions with complex extconf.rb files that use mkmf in order to check for the existence of pre-installed libraries, so adding additional install_packages method calls to them should not be that difficult.

Also, I am currently heads down trying to wrap up a very large refactor of an old project (only 62 remaining issues!), so I cannot estimate how much time I can allocate to the RFC process or development right now. :/

@postmodern
Copy link
Author

@flavorjones

I think that package manager name as key isn't suitable.

Your explanation of this was very helpful, thank you. @postmodern are you OK with using a "platform ID" as the key?

I would only suggest that we allow for a generic platform ID for all Debian-based distros or all RedHat-based distros, as there are many packages which have the same name between Debian and Ubuntu or RHEL and Fedora. This would reduce having to specify the same package name for every Debian or RedHat derivative platform.

I agree to detection by pkg-config command. But I suggest pkg-config instead of pkg for key.

I'm OK with this. @postmodern what do you think?

I'm OK with this. I am assuming the majority of package names match their pkg-config name, so this shouldn't be to difficult to guess or lookup.

For toolchain ... I think that we don't need to install them automatically

OK, we can always consider this later if we need it.

Agreed. If the ruby was compiled from source than a toolchain was already installed. If the ruby was installed from the package manager, then the package manager hopefully also installed the toolchain packages (this is the case with ruby-dev and ruby-devel packages).

Scenario 2 ... we can read .gemspec from extconf.rb if needed

What about the edge-case when there are multiple .gemspecs files in the unpacked gem source directory? Or would we query the rubygems API for the Gem::Specification of the gem which is being installed? Maybe this is something a install_package/install_packages mkmf method could handle?

Scenario 3.b ... it may be better that sqlite3-ruby is split to sqlite3-ruby and sqlcipher-ruby

Fair! sqlcipher support is a bit of a hack and should not delay this RFC.

I agree. It would be a lot of work to support defining either/or external dependencies and then detecting whether they are pre-installed or prompting the user to select which one they would wish to install. I feel like an extconf.rb file could better handle such complex logic. Creating separate ruby libraries for each C library backend is also an option.

@postmodern Do you feel OK with moving to an RFC at this point?

I am mostly OK with the proposal for adding a new gemspec attribute for installing build/runtime dependencies. Some of the syntax and constraints of what can actually go into the gemspec will probably be further discussed once we submit an RFC to rubygems. Although, I would like to see an example of the newly proposed syntax using platform ID and pkg-config names.

@byroot
Copy link

byroot commented Sep 13, 2022

Security Concerns

Should rubygems/bundler even attempt to install the package automatically though? On most systems that would require elevated permissions, which means a password prompt etc. And if you do a password prompt, you need to check whether you are in a TTY or not, otherwise you might hang forever etc.

IMHO just checking the package exists and giving a clear error message to the user with maybe the command to run should be enough, no?

@postmodern
Copy link
Author

@byroot this could be addressed by simply checking the return value of sudo and printing the appropriate error message, or checking $stdout.tty? and falling back to simply checking if the package is already installed and if not print an error message instructing to install the associated package name(s).

@kou
Copy link

kou commented Sep 13, 2022

Add a new mkfmf method called install_packages (or something similar) ...

This is the approach that pkg-config/native-package-installer does. If we're OK with the approach, we don't need to propose a new feature to RubyGems.

And if you do a password prompt, you need to check whether you are in a TTY or not, otherwise you might hang forever etc.

sudo is failed without a TTY. So we don't need to do anything for this case.

native-package-installer already has the feature https://github.com/ruby-gnome/native-package-installer/blob/master/lib/native-package-installer.rb#L60-L63 and it works (no hang, raises an exception instead) on CI.

giving a clear error message to the user with maybe the command to run

native-package-installer already has the feature https://github.com/ruby-gnome/native-package-installer/blob/master/lib/native-package-installer.rb#L83-L87 . It's showed only when sudo ... is failed.

@byroot
Copy link

byroot commented Sep 13, 2022

sudo is failed without a TTY.

It was one example among many. Docker container building hanging forever because apt is asking for a prompt is a common problem (generally solved with DEBIAN_FRONTEND=noninteractive). There is likely a very long tail of problem like this.

What I'm trying to get at, is that I suspect rubygems/bundler will likely consider this a huge can of worm they won't have any desire to support. Hence why I'm suggesting to include a scaled down proposal in case they don't want to go all the way.

But it's merely a suggestion really.

@kou
Copy link

kou commented Sep 14, 2022

There are many Dockerfiles that work without a TTY. So I think that there is no technical difficultly around TTY.

@voxik
Copy link

voxik commented Sep 14, 2022

Several random notes (and apologies if something was already mentioned and I missed it).

Dependency specification

What if the dependencies were specified completely differently then by package manager or distribution. What if the initial example looked like this:

gemspec.metadata['external_dependencies'] = {
    'sqlite'  => %w[libsqlite3-dev sqlite-devel sqlite],
    # ...
  }

Then the DNF implementation would be:

$ sudo dnf install libsqlite3-dev sqlite-devel sqlite --skip-broken
Last metadata expiration check: 1:27:14 ago on Wed Sep 14 15:17:09 2022.
No match for argument: libsqlite3-dev
Dependencies resolved.
================================================================================
 Package             Architecture  Version                 Repository      Size
================================================================================
Installing:
 sqlite-devel        x86_64        3.39.3-2.fc38           rawhide        143 k
Installing dependencies:
 sqlite              x86_64        3.39.3-2.fc38           rawhide        799 k

Transaction Summary
================================================================================
Install  2 Packages

Total download size: 942 k
Installed size: 2.2 M
Is this ok [y/N]: 
  1. I am not sure how other package managers would cope with this but I'd assume they could handle / ignore non existing dependencies
  2. Apparently, the sqlite package should not be installed 🤷‍♂️
  3. However, if the rubygems.org API provided a way to easily obtain this metadata, I think we could come with some way to help RubyGems to pick only the right dependencies, e.g. by using some virtual provides on Fedora / RHEL

Package managers vs distros

I don't think that there is clear winner. E.g. YUM or DNF might mean the same or not. Where also the package names on RHEL might or might not differ from package names on Fedora.

Don't add too much know how about package managers into RubyGems

I might be biased as a co-author of https://github.com/voxik/gem-nice-install, but I think that the plugin way was nice. If there is the XYZ distribution, their maintainers might provide plugin for their ABC packager. I don't think that RubyGems should know too much about any other package managers or distros

Just FTR, the gem-nice-install arguably implemented support for 3 package managers on Fedora:

https://github.com/voxik/gem-nice-install/blob/master/lib/rubygems/nice_install/fedora_ext_installer.rb

where the PackageKit way is prioritized, because it allows nice prompt for elevated user privileges. The YUM and DNF support was implemented, because at that time, YUM was the default and DNF was the new kid in the block.

@postmodern
Copy link
Author

@voxik you should probably read the previous comments. The originally suggestion was to group packages by package manager (ex: "dnf" => %w[...], "apt" => %w[...], ...). The new proposal is to group them by platform ID (ex: "rhel" or "fedora") and use a combination of pkg-config package names and package manager specific package names, which allows for both testing if the library was already installed and installing it via the system's package manager.

I think updated gemspec examples for the proposal would help clear up any confusion.

@voxik
Copy link

voxik commented Sep 14, 2022

My proposal was to avoid such groups. Not saying it is the best proposal, but something to consider. Because as I said, I'd suggest against doing assumptions.

BTW speaking of pkg-config, you can do dnf install "pkgconfig(sqlite3)" to get sqlite-devel installed on Fedora. These virtual provides should be reliable, because they are autogenerated.

@flavorjones
Copy link

Note for posterity: psych 5 no longer bundles libyaml, and the absence of a libyaml-dev distro package in many CI images is causing builds to fail. See ruby/setup-ruby#409 for background. A proposal like this, if it is adopted and used, could prevent breakage in similar scenarios in the future.

I DMed with @postmodern and I'm going to try to turn this into a real RFC in the next few weeks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment