Skip to content

Instantly share code, notes, and snippets.

@jjhelmus
Last active January 7, 2019 18:54
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jjhelmus/bf9489e3847d8e1737afca7d04b2f66d to your computer and use it in GitHub Desktop.
Save jjhelmus/bf9489e3847d8e1737afca7d04b2f66d to your computer and use it in GitHub Desktop.

Conda packages contain a metadata file (index.json) which provides information about the package and the software contained within. The metadata for all packages within a channel are combined into a single json file (repodata.json) which is used by conda for package discovery and to resolve and verify dependencies when a package is installed or removed from an environment.

Unfortunately, the metadata contained within a package may be incorrect, either because of a mistake made when the package was created or because new information makes the original metadata incomplete.

Updating the metadata within a package would change the contents and md5 hash of the file. This is not an option as the md5 hash of the package is provide on the download page and users expect this to be constant. Updating the metadata in the repodata.json file is a better solution although is not without some difficulties.

This document provides a specification for communicating changes to a package's metadata via an "update" json file. These changes should be applied to the information in repodata.json. The json files making these changes should be stored and made available to everyone who has access to the channel. It should always be possible to recreate the contents of the repodata.json file by combining the metadata within each package in the channel and then applying the changes specifieds by the corresponding update files.

The specification is as follows. Each file contain json formatted data with the following entries. Lines starting with '#' are comments which provide details of the standard and must not be included in the file.

{
    ### These are required entries. An update without these entries will be
    ### rejected by the update tool and an error will be raised.

    # update_version : the update specification version. Future changes to the
    # specification should increment this value.
    "update_version": 1,

    # update_number: a integer starting at one that is increment when the same package
    # is updated multiple time. When multiple updates exist for a package, the
    # updates with the largest update_number is applied. If two updates for a
    # package have identical update_number neither is applied and an error is
    # raised.
    "update_number": 1,

    # The date on which the update is made in YYYY-MM-DD format.
    "update_date": "2017-08-10",

    # A text comment describing the change.
    "update_comment": "Comment describing change",

    # The filename of the package being updated
    "package": opencv-2.4.10-np110py27_1.tar.bz2,

    ### These entries are optional.  If they are included they must match the
    ### data in the package being updated.  If they do not the update will not
    ### be applied and the tool with raise an error.

    # the build string of the package
    "build": "py27cuda7.5cudnn6.0_0",

    # the build number of the package
    "build_number": 0,

    # the date (YYYY-MM-DD) when the package was created
    "date": "2017-08-01",

    # the md5 of the file
    "md5": "ecc64cc965fe8a09ab6bcd42bf500b81",

    # the package name
    "name": "tensorflow-gpu",

    # the size of the package in bytes
    "size": 77056015,

    # the package version number
    "version": "1.2.1"

    ### These entries are optional.  If they are included they overwrite the
    ### entries of the same name in the package's metadata.

    # A list of packages that a package depends upon
    "depends": [
        "jpeg 8d",
        "libpng 1.6.17",
        "numpy 1.10*",
        "python 2.7*",
        "zlib 1.2*"
    ],

    # the package license
    "license": "BSD",

    # the package license family
    "license_family": "Apache"

    # a list of features that the package has
	"features": "",

    # a list of features tracked by the package
	"track_features": "",

    # a text summary of the
	"summary": "",

	### I'm not sure if update to these keys would be needed.
    ### If they are they can be added to the above list.
	"app_cli_opts": "",
	"app_entry": "",
	"app_type": "",
	"icon": "",
	"space_anchor": "",
	"type": "",
}

Note that any updates overwrite the listed entries if they are provided. If these are not provided the information from the package is used.

If multiple updates are made to the same package via update files with increasing update_number entries, only the entries in the update with the largest update_number are applied. For this reason it is recommended to start a new update by copying the existing update file so that any existing replacements are maintained. Additionally, the update tool should always read the metadata from the package prior to applying a new update in case an existing update has already been applied.

To provide a concrete example, the opencv-2.4.10-np110py27_1.tar.bz2 package has the following entry in repodata.json:

"opencv-2.4.10-np110py27_1.tar.bz2": {
    "build": "np110py27_1",
    "build_number": 1,
    "date": "2015-10-06",
    "depends": [
		"jpeg 8d",
		"libpng 1.6.17",
		"numpy 1.10*",
		"python 2.7*",
		"zlib 1.2*"
    ],
    "license": "BSD",
    "md5": "6b4bb1b8a55a735d68c554aebf0d9970",
    "name": "opencv",
    "size": 9670688,
    "version": "2.4.10"
},

The following update file could be used to correct the jpeg dependency version.

{
    "update_version": 1,
    "update_number": 1,
    "update_date": "2017-08-29",
    "update_comment": "Correct jpeg version",
    "package": opencv-2.4.10-np110py27_1.tar.bz2,
    "md5": "6b4bb1b8a55a735d68c554aebf0d9970",
    "depends": [
		"jpeg 9*",
		"libpng 1.6.17",
		"numpy 1.10*",
		"python 2.7*",
		"zlib 1.2*"
    ]
}

The resulting repodata.json entry would then be:

"opencv-2.4.10-np110py27_1.tar.bz2": {
    "build": "np110py27_1",
    "build_number": 1,
    "date": "2015-10-06",
    "depends": [
		"jpeg 9*",
		"libpng 1.6.17",
		"numpy 1.10*",
		"python 2.7*",
		"zlib 1.2*"
    ],
    "license": "BSD",
    "md5": "6b4bb1b8a55a735d68c554aebf0d9970",
    "name": "opencv",
    "size": 9670688,
    "version": "2.4.10"
},
@mcg1969
Copy link

mcg1969 commented Aug 30, 2017

I would like to suggest that we include a history field as well. It can be optional, but useful for documentation, to store any previous updates within the current one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment