Skip to content

Instantly share code, notes, and snippets.

@olpaw
Last active November 17, 2021 12:52
Show Gist options
  • Save olpaw/50a4e4db75afd402a2f7fb7655422e34 to your computer and use it in GitHub Desktop.
Save olpaw/50a4e4db75afd402a2f7fb7655422e34 to your computer and use it in GitHub Desktop.
Restricting the scope of META-INF/native-image configs

Restricting the scope of META-INF/native-image configs

Motivation

To allow native-image users and library vendors to add native-image configuration data to their projects/libraries GraalVM native-image supports adding native-image configuration files into resource location META-INF/native-image.

This is described in detail in https://github.com/oracle/graal/blob/master/docs/reference-manual/native-image/BuildConfiguration.md

While this works reasonably well for monolitic projects this turned out to cause issues with real-world projects composed of many smaller projects/libraries. Often META-INF/native-image configs are written in such way that they not only affect native-image configuration of the project that they are intended. This can easily happen with snippets like:

{
  "resources": {
    "includes": [
      {"pattern": ".*/l10n.properties"}
    ]
}

Having this in a resource-config.json with cause inclusion of all resource paths matching that pattern. The image builder will not be able to restrict the allowed resource paths to the packages that within the same jar-file or directory as the META-INF/native-image config.

The same issue also exists for using native-image options --initialize-at-build-time and --initialize-at-run-time. Using them (without also passing a comma-separated list of packages and classes) causes them to affect other packages outside of where they are used. I.e. if any jar file on the classpath uses e.g. --initialize-at-build-time all other classes from other classpath entries are also initialized at image-buildtime.

Classpath entries are meaningless for indiviual resource path entries

The reason is that for the Java classpath having two classpath-entries or just one with the same contents of the two combined is treated the same way. This is what allows the creation of so-called uber-jars where multiple jars with dependency relationships are combined into a single jar for easy usage:

java -jar my-uber.jar

vs.

java -cp foo.jar:bar.jar:foobar.jar:base.jar:utils.jar my.app.Main

The downside of this convinience is that we cannot attach any meaning to the place where a given META-INF/native-image config is located. A user could always turn a jar into being part of an uber-jar thus the original location of that config (i.e. in which jar-file or directory the META-INF/native-image/... resources originally resided in) is lost.

Using Java Modules to restrict the scope of META-INF/native-image configs

When Oracle introduced Java Modules the issue described above got fixed. Now there is a module-path and each entry on the module-path has to be a module. It is not possible anymore to combine two modules into one by coping their contents into a single jar (or directory). There is no such thing as an uber-module. While this at first sounds like step backwards this is actually exactly what is needed. Now we can finally attach meaning to where a given META-INF/native-image config is located.

We can define that:

  • A META-INF/native-image config located in module is only allowed to affect classes and resources located with that same module.

Using per-package META-INF/native-image configs

Orthogonal to the above solution for module-path we also want to have a solution to restrict the scope of META-INF/native-image configs found on the classpath. This can be achived by adding meaning to the name of the resource-subdirectory where the config files are located. As shown in the manual the recommened place for native-image config files is to have them in subdirectories. So while it is allowed to place the config files to any arbitrarily-nested subdirectory within META-INF/native-image the following is recommened.

META-INF/
└── native-image
    └── <groupID>
        └── <artifactID>
            └── resource-config.json

We can naturally extend this scheme by adding meaning to the name of the immediate parent directory of where the actual config-files reside in. For example:

META-INF/
└── native-image
    └── <groupID>
        └── <artifactID>
            └── <package name specifier>
                └── resource-config.json

We can now define the following behaviour:

  • It the native-image builder finds native-image config-files in a directory whose name describes a package name specifier and packages exists on the classpath that match the package name specifier then restrict the scope of native-image config-files to the packages that match the package name specifier. If no such packages are found the config-files are ignored.
Package name specifier syntax
  • if.com.foo.bar matches: Java package com.foo.bar
  • while.com.foo.bar matches: Any Java package that start with com.foo.bar

if and while are safe to use because they are ReservedKeywords and thus will never be part of a legal java package name.

Filesystem path components created with the above syntax are known to be supported on all our supported platforms (Linux, OSX, Windows).

Consequences

Implementing this proposal allows us to make unqualified use of --initialize-at-build-time (and --initialize-at-run-time) acceptable again as long as they are only used within config directories.

Eventually the same policy could also be applied for native-image options within a Java module.

Implementation effort

Implementing the above proposal is relatively straightforward because as of https://github.com/oracle/graal/commit/6b37bb9f85441a5a36e52cf4d49233d14b08c941 the image-builder now knows for each HostedOption where it originated from:

  • Path of classpath/module-path entry (directory or jar)
  • Resource-path location within jar-file
@olpaw
Copy link
Author

olpaw commented Nov 15, 2021

As people already mentioned in the Compatibility and Community meeting is that we could take this also as an opportunity to restrict what kind of native-image configurations we actually want to allow to be used in per-package META-INF/native-image configs. E.g we could disallow using Args and JavaArgs in native-image.properties within per-package META-INF/native-image configs (except for --features=..., we have to allow that somehow).

@gradinac
Copy link

I like the idea of being able to restrict the scope of the native-image configuration for both module and classpath image builds. One note is that while most libraries should ideally need to only configure themselves, there will most likely be edge cases where they will need to also provide configuration for e.g. a third party closed-source library that is no longer being maintained. As far as I see from the proposal, this would still be possible.

I really like the idea behind package name prefixes (if and while) as they would basically guarantee that we don't accidentally pick up a random configuration directory as package configuration directory and they neatly allow us to differentiate between matching a package name (if) and a package prefix(while). The added advantage is that they are relatively short - less typing :). However, I think it may be a bit confusing for someone looking at it for the first time (though I think with time people would get used to it). Brainstorming another idea for this:
We could introduce special top level folders in META-INF/native-image that would imply the scope of configuration files within them - e.g., META-INF/native-image/package-scoped-config and META-INF/native-image/package-prefix-scoped-config. We could then use package names as directory names within those folders:

  • com.foo.bar in package-scoped-config matches only com.foo.bar
  • com.foo.bar in package-prefix-scoped-config matches any package that starts with com.foo.bar

I also agree with restricting the kind of configuration we allow in per-package configs. Disallowing Args in native-image.properties would work - we could then introduce a new property for everything that we want to allow - e.g., for features, we could add a Features property.

@olpaw
Copy link
Author

olpaw commented Nov 17, 2021

As far as I see from the proposal, this would still be possible.

Yes. I do not suggest to take anything away that we currently have but only to add additional variations as described. We do not want to break existing setups but we want to provide a path for users and library developers to switch to a safer form of native-image configuration for their code/libraries that does require as little as possible changes on their side.

I also agree with restricting the kind of configuration we allow in per-package configs. Disallowing Args in native-image.properties would work - we could then introduce a new property for everything that we want to allow - e.g., for features, we could add a Features property.

Yes, allowing Features = fully.qualified.name.of.my.Feature would be the way to go. That would also make it easy for us to check that all specified Feature classes are indeed withing the scope of the config (as defined by the <package name specifier> the native-image.properties file is contained in).

We could introduce special top level folders in META-INF/native-image that would imply the scope of configuration files within them - e.g., META-INF/native-image/package-scoped-config

I agree, that would be also be viable (although a bit more verbose).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment