Skip to content

Instantly share code, notes, and snippets.

@IainNZ
Last active December 20, 2015 06:29
Show Gist options
  • Save IainNZ/6086173 to your computer and use it in GitHub Desktop.
Save IainNZ/6086173 to your computer and use it in GitHub Desktop.
Julep 2: Package Requirements and Quality Standards

Julep 2: Package Requirements and Quality Standards

This is based on the discussion started here:

as well as the manual.

This Julep sets the rules for packages to improve the quality of the packaging ecosystem. All packages submitted to METADATA.jl must meet these requirements.

Requirements for the Package Itself

REQUIRE file

  • Packages must have a REQUIRE file that, as well as listing all packages it depends on, explicitly states the version of Julia it depends on.

License information

  • Packages should have license information, either in a dedicated LICENSE file or stated in a README

Package Testing

  • Packages should have a test/ directory, that contains a file runtests.jl
  • The runtests.jl file will be called by an automatic package ecosystem testing system. As such, it should not call long-running performance tests, and should not require any configuration above-and-beyond that is automatically carried out during package installation. If a test fails it should throw an error that causes Julia to exit with an error code.
  • Packages may contain other testing code in this folder of any nature, it will not be called.
  • Packages are encouraged to have a .travis.yml file for TravisCI integration.

Requirements of the Package's METADATA.jl entry

Each package has a folder in METADATA.jl, with the following contents

url

  • Every package must have a url file in the root of their folder in METADATA.jl
  • The file should contain a single line that states the address of the package's git repository, e.g. git://github.com/JuliaStats/DataFrames.jl.git

DESCRIPTION.md

  • Every package must have a DESCRIPTION.md file in the root of their folder in METADATA.jl
  • The format of the file should be as follows:
# Description
A short description of the package
# Keywords
A comma separated list of keywords, e.g. distributions, probability, random normals, monte carlo
# Maintainers
A comma separated list of maintainers
# Install notes
A short description that will be displayed when the package is installed. Should be used only if manual operations must be performed to complete installation.

versions

@johnmyleswhite
Copy link

Thanks for pushing this forward.

Just to make sure we're on the same page: your NOTES file is a stopgap until BinDeps can handle installation of dependencies across common platforms?

@mlubin
Copy link

mlubin commented Jul 26, 2013

Nice. I don't think we should make this a strict requirement for accepting packages into METADATA. We can strongly encourage people to do it by having an automated system to run the tests, and the package listing on the web could display an icon indicating if the tests pass or not. This should shame the package maintainers into implementing this framework.

@IainNZ
Copy link
Author

IainNZ commented Jul 26, 2013

@johnmyleswhite, I think BinDeps will address most people's needs, but I think NOTES will still be useful even if BinDeps is perfect. Consider Gurobi.jl - it depends on a commercial package, and requires setting an environmental variable. You have to go read the package README, and judging from mailing list posts, this is not something everyone does. I see this as another way to catch.

@mlubin, so if a package doesn't implement a test/ folder at all, what would you display? And what about the stuff about requiring a dependency on a Julia version? I feel fairly strongly that should be mandatory from here on out - its not a big ask.

Any thoughts on the testing plan itself? Its pretty different from the original Julep posting...

@staticfloat
Copy link

I think dependency on a Julia version should be mandatory. Testing, should not be. Packages with tests can have Pass/Fail, and packages without tests should have an N/A graphic.

I think eventually it would be nice to have a standardized testing framework. We have our own already put together in test/ and another one inside test/perf, but something that freaks out, prints an informative error and returns nonzero when something is wrong seems sufficient right now.

Viral has been talking to me about creating a nightly "package test run" where I iterate over all packages in METADATA.jl, run their tests and upload the status to a webserver somewhere. This should be easy enough, and again, the status would have to be a 3-state status: Pass/Fail/No Test Defined.

@IainNZ
Copy link
Author

IainNZ commented Jul 26, 2013

Also relevant: http://www.youtube.com/watch?feature=player_embedded&v=1C5-A-BgPM8 around 24:30 - talking about Node's NPM, where they score modules using a heuristic to give a "score" for packages to indicate quality.

@staticfloat
Copy link

Also also relevant, the discussion in JuliaLang/julia#3540 (comment). I think it would be neat to have a provision for a "description" for each package a la apt-get, so that a Pkg2.search() function can give people a better shot of finding things that aren't named exactly what they're looking for.

If we do put this in the Julep, is it unreasonable to demand it be mandatory? I don't think so, as these are pretty basic requirements, and not a huge strain to put on package developers.

@IainNZ
Copy link
Author

IainNZ commented Jul 29, 2013

I killed the NOTES file, baked it into the DESCRIPTION.md from JuliaLang/julia#3540

@ViralBShah
Copy link

We also need to have a METADATA tools version to track changes in these fields. It would also be great to auto file issues when we detect failed tests.

@staticfloat
Copy link

I like this, and we should definitely update Example.jl once we reach a consensus to illustrate that, e.g. you don't have to have separate code paths for the ALL, INSTALL, TEST, etc.... codepaths; for simple packages, they can all be equivalent. I have no further suggestions on this, thanks for stepping up and doing it!

@IainNZ
Copy link
Author

IainNZ commented Jul 29, 2013

I have no idea how to proceed with this, but it seems to be not very controversial, at least. So I'll start submitting pull requests to things I guess :D

@IainNZ
Copy link
Author

IainNZ commented Jul 30, 2013

I've updated Example.jl
https://github.com/JuliaLang/Example.jl
using the new testing ideas above.

I notice that Example.jl has REQUIRE too - we now have REQUIRE files in 3 places (in .julia/, in individual package entries in METADATA.jl, and now possibly in the package itself). There is also a VERSION in there too. Whats the deal with this? Is this a standard that didn't catch on?

@mlubin
Copy link

mlubin commented Jul 30, 2013

@staticfloat, I think N/A for not having tests is too neutral. Not having tests is about as bad as failing tests. Also there's a lot of noise with travis tests, and I don't think we should publicly shame package maintainers if they quickly address test failure issues. The purpose of displaying test status on package listings is to give users a sense of the quality of the package, more detailed testing status can be displayed on the package's github page.

With that in mind, I think there should be three states displayed on the package listing:

  • Passed tests within the past week
  • Failed tests for more than a week
  • Failed tests for more than a month / package doesn't have tests

I think this will smooth out the test failures and still provide useful information to users. Anyway, this doesn't really impact the design of the testing infrastructure and can easily be tweaked later.

How should the INSTALL test interact with Pkg? What should it trigger if it fails?

@mlubin
Copy link

mlubin commented Jul 30, 2013

Another question: how will this affect packages with binary dependencies that can't be installed in a testing environment? For example, Gurobi.jl requires a proprietary library and license to use. I can also imagine cases where compiling the binary dependencies times out on travis.

@mlubin
Copy link

mlubin commented Jul 30, 2013

I'm talking to myself, but perhaps there should be a field in DESCRIPTION.md that indicates that the package cannot be tested due to external dependencies.

@IainNZ
Copy link
Author

IainNZ commented Jul 30, 2013

I guess if the INSTALL test fails it should abort, just like if BinDeps failed. Seems rational?
One way to solve the 'can't - test' issue would be by having runtests.jl exit with a specific error code. Alternatively it could just have a NO-OP runtests.jl and put a note in DESCRIPTION.md description as a caveat?

@staticfloat
Copy link

We have to think about what the tests are actually good for; what we want to accomplish with them. There are a couple different classes of tests, which we are differentiating via the INSTALL, TEST, etc.... modifiers:

  • INSTALL should be tests that ensure that an installation went properly, and the package is ready to be used. Failure means that the package is non-functional, and I agree with @IainNZ, the package should be removed. This should (hopefully) be pretty hard to do, barring cases such as Gurobi where a binary dependency must be present.
  • TEST should be tests that ensure that a package is self-consistent. This is stuff like making sure 2 + 2 == 4, etc... These are the kinds of tests that we'll run to ensure that a new Julia version doesn't screw stuff up inside this package, and also that any binaries that were linked to are working properly. I think it's reasonable to not run these on package install and only when requested. These are run by Travis.
  • PERF I'm not 100% sold on, but having it there as an option doesn't hurt much.

If a test can't be done on Travis/has external (non-opensource) requirements, the author will either have to remove ~/.travis.yml, or define tests that do nothing/print out warnings but still pass.

Having typed all this out, I suddenly feel like INSTALL tests are really nothing more than what is handled by BinDeps and Pkg2, and the separation is not necessary. Can someone give me a good reason why we'd want to have INSTALL tests vs. TEST tests?

@johnmyleswhite
Copy link

I'm happy with the direction this is heading. I think the Pass, Fail, NA option for tests passing is fine.

I really like Viral's idea of automatically submitting issues.

@IainNZ
Copy link
Author

IainNZ commented Aug 1, 2013

@staticfloat, Only reason to separate INSTALL and TEST is that some package's full test suite might take too long to run - slowing down package installation. I imagine running the DataFrames.jl test suite takes a non-trivial amount of time (does it?). If this is a non-factor, we should combine them, because it definitely adds complexity.

Automatic issue submission is awesome.

Given we're looking at rolling this out post 0.2, or at least not blocking 0.2 on it, I'm going to take some time prototyping some code in https://github.com/IainNZ/PackageEvaluator.jl

@IainNZ
Copy link
Author

IainNZ commented Aug 1, 2013

Example output from script so far:


Package Analysis Results

REQUIRE file

  • Requirement: packages must have a REQUIRE file
    • ✓ Passed (+1.0)
  • Requirement: REQUIRE file specifies a Julia version
    • ✗ Failed!

Licensing

  • Recommendation: Packages should have a license
    • ✓ Passed (+1.0)
    • Detected license in JuMP.jl/LICENSE.md: 0.0

Summary

  • Total score: 2.0
  • One or more requirements failed - please fix and try again.

@mlubin
Copy link

mlubin commented Aug 2, 2013

What tests would one realistically want to do for INSTALL besides checking that julia and external dependencies are properly installed? I agree with @staticfloat that this is very similar to what's already guaranteed by Pkg2 and BinDeps.

@IainNZ
Copy link
Author

IainNZ commented Aug 2, 2013

@IainNZ
Copy link
Author

IainNZ commented Aug 2, 2013

@mlubin so no INSTALL test option then? Just TEST, PERF, and ALL options? If BinDeps is good enough, I don't really have any good ideas for reasons for INSTALL, anything I could think of related to that.

@IainNZ
Copy link
Author

IainNZ commented Aug 3, 2013

You know, the more I thought about it - I think we are very unlikely to run centralized performance testing, so why should we even bother creating a requirement to provide that functionality. Instead, we just say there is a file, you should provide it, tests will run - don't put performance tests in there. Minimal set of requirements = best. Thoughts?

@aviks
Copy link

aviks commented Aug 3, 2013

Minimal set of requirements = best

Absolutely agree

@aviks
Copy link

aviks commented Aug 3, 2013

@IainNZ Is it your intention to add PackageEvaluator to the METADATA.jl testsuite? Then all PR's for new packages will get automatically tested on Travis.

@IainNZ
Copy link
Author

IainNZ commented Aug 3, 2013

Right now its just experimental, but yes, if we can get everyone to agree on the above requirements, it will.
My timeline is vaguely:
a) People seem fairly OK with the Julep, so
b) I wrote PackageEvaluator to give people a feel for how they are doing.
c) Announce: packages must meet these clear rules, and should probably do these other things too. See how you are doing using PackageEvaluator
d) After x weeks, put it into METADATA.jl test suite. It'll almost surely fail on some unmaintained package. We try and address all the troublesome packages before doing this though.
e) Do we kick out bad packages?
f) In parallel, we have the package ecosystem testing stuff. I think it'd be cool to have a score for each package based on its pass rate and PackageEvaluator score. I stole this idea from Perl CPAN, and I know that Node.js/NPM are planning on doing this too.

@IainNZ
Copy link
Author

IainNZ commented Aug 3, 2013

More attention needed on the DESCRIPTION.md file, I don't know I like the idea of doing it as Markdown + reinventing the wheel...
@johnmyleswhite was suggesting we take the R package format. I'd feel comfortable with JSON personally, but I think this is all very personal-taste so maybe we should just semi-arbitrarily pick something.

@IainNZ
Copy link
Author

IainNZ commented Aug 5, 2013

Another thought: the description should probably go in the package repository as well as in METADATA. I dislike the duplication...

@johnmyleswhite
Copy link

The R DESCRIPTION file format is actually a Debian format. With trivial modifications, it can be treated as YAML.

@mlubin
Copy link

mlubin commented Aug 20, 2013

An idea for the testing infrastructure: http://jenkins-ci.org/. While Travis-CI is great for what it does, I think we need something a bit more flexible that we can run nightly with any combination of julia versions/package versions (metadata vs. master) that we like. This means we also need our own hardware.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment