Skip to content

Instantly share code, notes, and snippets.

@KOLANICH
Last active August 31, 2020 19:03
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save KOLANICH/d4684de5da08b72f94e85dc99801f089 to your computer and use it in GitHub Desktop.
Save KOLANICH/d4684de5da08b72f94e85dc99801f089 to your computer and use it in GitHub Desktop.
PEP 9999: A metadata format for entry points

PEP: 9999 Title: A metadata format for entry points Author: KOLANICH Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 29-Jul-2020 Python-Version: 3.9 Post-History:

Abstract

In this PEP we propose a format for annotating entry points with metadata, available before actually loading an entry point, which mitigates some resource consumption, security and legal issues.

Motivation

Entry points [EPs] provide a mechanism for python packages to implement a plugin system. While one currently can expose the needed metadata via a custom API defined specially for your plugin system, it poses differrent issues related to the fact one may not want to load the plugin at all depending on various conditions:

  • a plugin can be just unneeded for your case, so it may be redundant to execute their initialization code. For example there may be different plugins with different features, and a user may need only one of them.
  • a plugin can be licensed under a restrictive license. There is a practice to license software depending on usage of a certain optional dependency, for example if a library A can optionally use a GPL-licensed library G, without satisfying the condition making A a derivative work of G (G provides some non-essential additional functonality), authors often license A under a dual license, a permissive one on the condition that G is not used (and linked) and GPL for the case it is used. But loading a GPL-licensed plugin module depending on G to just retrieve its metadata means A is required to be licensed under GPL.
  • calling an entry point may be insecure depending o some conditions.
  • an entry point can require the caller to provide it with some information in the form of function call arguments. Sometimes this kind of info can be encoded into type annotations and argument names, but sometimes it cannot.
  • different ways to retrieve metadata can have impact on software processing the metadata of python packages in generic ways.

All these issues can be solved using a standardized metadata format.

Specification

The proposed new entry point declaration format is

::

<JSON-serialized metadata> = <package name>:<function name>

where <JSON-serialized metadata> is a dict serialized into [JSON] JSON__ matching the following [JSONSchema].

::
{

"type": "object", "properties": { "N": {"type": "string", "title": "Name of entry point"}, "P": {"type": "integer", "title": "Priority of a plugin. Not used by pkg_resources, but just a standardized field for it.", "default": 0}, "I": { "type": "array", "title": "Issues", "additionalItems": { "type": "array", "items": [ {"type": "string", "title": "ID", "doc": "ID of an issue. Not a free-form string, but an item from a fixed set, specific to the plugin interface. Used to recognize presence of the issues."}, {"title": "issue-specific info", "doc": "The info meaningful only for the plugin interface implementation", "default": null} ] } }, "E": {"title": "Plugin-ingerface-specific unstandardized info", "default": null, "doc": "In complying implementations of pkg_resources it is available as a metadata attribute of an pkg_resources.EntryPoint object."}, }

}

In future the root dict may be extended by keys consisting of any single ASCII character not requiring escaping.

For the purpose of this spec, everything before the string beginning and the last = is considered to be <JSON-serialized metadata>.

If <JSON-serialized metadata> is not a well-formed JSON, or a JSON not matching the schema, the implementation MUST fall-back to the legacy behavior, setting the name as if it was generated by the previous versions of pkg_resources and set metadata to None.

All the information except under E must be

Backwards Compatibility

The legacy entry points description format was

::

<entry point name> = <package name>:<function name>

.

The proposal doesn't restrict the set of well-formed entry points names. But it changes the interpretation of some of them. More precisely, all the names containing a well-formed JSON will require changes to be done. But such names look like a way to recreate the metadata system used in this PEP, and their authors should probably modify their code in order to use it. Other uses for names won't require changes in the code.

Forward Compatibility

Entry point name can contain special characters. We exploit this fact, legacy implementations of pkg_resources, that can be detected by absence of metadata attribute, consider the metadata as a name, which can be parsed manually by the software supporting such legacy versions of pkg_resources using the very straightforward code

::
if hasattr(ep.__class__, "__slots__") and "metadata" in ep.__class__.__slots__:

metadata = ep.metadata

else:

encoded = ep.name if ep.name.startswith("{"): try: metadata = json.loads(ep.name) ep.priority = metadata.get("P", 0) ep.issues = metadata.get("I", ()) ep.metadata = metadata.get("E", None) ep.name = metadata["N"] except BaseException: metadata = None else: metadata = None

Rejected Ideas

  • Using another serialization scheme. The counter-arguments are:
    • they are not forward-compatible. In order to keep forward-compatibility we are really restricted in a set of symbols that can be used in the serialized metadata. So we cannot use more efficient serialization schemes keeping strings intact while just encoding their length, such as [BENCODE] bencode__ or [CBOR] CBOR__. We also cannot use serializations relying on multiline structure, such as [YAML] YAML__.
    • It is already planned to use JSON in PEP 426 [PEP426];
    • JSON is widespread, and is already inside the standard library.
  • Putting the JSON metadata into the field other than the name means breaking forward compatibility.
  • Putting JSON metadata in front/back of the name with some delimiter — it actually works and used to be implemented this way for some time. The issue is that it is actually inventing an own format within name, requiring more custom code.
  • Using an another character as a separator is also possible.
  • Encoding entry point name into serialized metadata, making the first part of the metadata contain JSON entirely, and the name is within it under the key "N", and the rest of metadata is within "E" key. May provide far better extensiability without breaking forward compatibility and breaking backward compatibility a bit. Needs more discussion if it is feasible or not.

Reference Implementation

See Forward Compatibility.

References

Copyright

This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.

BENCODE

The BitTorrent Protocol Specification https://www.bittorrent.org/beps/bep_0003.html

CBOR

RFC 7049 - Concise Binary Object Representation https://tools.ietf.org/html/rfc7049

EPs

Entry points specification https://packaging.python.org/specifications/entry-points/

JSON

RFC 8259 - The JavaScript Object Notation (JSON) Data Interchange Format https://tools.ietf.org/html/rfc8259

JSONSchema

JSON Schema Specification https://json-schema.org/specification.html

PEP426

PEP 426 -- Metadata for Python Software Packages 2.0 https://www.python.org/dev/peps/pep-0426/

YAML

YAML Ain’t Markup Language (YAML™) Version 1.2 https://yaml.org/spec/1.2/spec.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment