Over the last 18 months, we have seen a rise in the usage of the notebook aspect of the LSST Science Platform. As more users have come to the platform, the number of notebooks has also increased. This has happened relatively organically with only passing thought lent to how to organize, grow, and test the ecosystem of tools developed through notebooks.
This document is intended as a starting place for generating a set of policies and recommendations for teams using notebooks to do analysis and produce tooling.
Notebooks provide a nice environment for exploratory and pedagogical code development. Some reasons they are good for this type of work are:
- They incorporate tab completion and easy access to help information
- Documentation can reside with the code via markdown cells
- Cells can be executed in arbitrary order, so the entire script doesn't need to be run to iterate on a particular part of the notebook
- Code in cells can be flexible. E.g. one cell can execute bash commands while the next cell contains python
The same reasons notebooks perform well in exploratory scenarios are the reasons they are somewhat non-optimal for rigorous software engineering. A concrete example is that running cells in arbitrary order makes it easy to end up with a notebook in a state where it will not successfully complete when run sequentially. This makes testing tricky.
Commissioning and science validation are, by their nature, exploratory, analytical exercises. This makes notebooks a natural place to start. However, we expect initial exploratory efforts to result in tooling we will want to execute over and over, including potentially in a batch system. With these two competing desiderata, we need a plan for how to transition between the two contexts: exploration vs. concrete tooling.
I will not go into aspects that are part of the standard DM development workflow here. Instead I will focus on the aspects specific to development with notebooks.
As a general rule, notebooks should be kept in a single git repository for each broad activity. E.g. a repository for commissioning and a repository for DM Science Validation.
The repositories should have a defined structure. Informed by the data management git repositories, I suggest the following:
notebooks/wip
-- A directory that holds notebooks that are works in progress. These need not adhere to any specific quality specifications, but should have enough documentation to be readable by someone else.notebooks/prod
-- Notebooks meant to adhere to specific quality guidelines. Notebooks cannot be added toprod
fromWIP
without a review.bin
-- A place to put command line tools. Command line tools are constructed from python modules as much as possible to improve test coverage. I.e. any code logic should be factored into functions/methods in modules in thepython
directory.python
-- Python library code goes here.tests
-- A place for unit tests of the modular python code.
These repositories will be eups
packages so that we can use all the existing infrastructure.
Notebooks should be reviewed by someone other than the original author. The review should look for several things:
- The notebook should adhere to a standard templated format (not defined here). At the very least, the notebook should state the end product from the notebook, the responsible author, and the version of the stack it is compatible with (perhaps computed by the CI system).
- When possible, the notebook should run sequentially from top to bottom.
- The reviewer should also look at the markdown cells for accuracy.
- Any duplication of code between this and another notebook should be identified and factored out into a module.
- Reviews should attempt to promote accepted idioms for common tasks.
Notebook repositories should be testable.
We do not have a notebook continuous integration system yet, but when it exists, these repositories should regularly have both unit tests and notebook tests run.
This only applies to notebooks in notebooks/prod
.
Be aggressive about moving common code into python modules rather than letting multiple copies bit rot in separate notebooks.
Have a concrete plan for what to do about notebooks that do not pass testing.
At some point they should be moved back to notebooks/WIP
, but whether that is immediate or there is a grace period for the author to fix the notebook should be up to the repository owner.
Whether notebooks are cleared before committing, will depend on the intentions of the team using the notebook repository.
- Clearing outputs will reduce the overall size, make the diff cleaner, and makes the notebook clean for the next person to execute
- Leaving outputs in the committed notebook will save time since the notebook does not need to be executed to see plots etc. Leaving outputs also allows for robust testing since
nbeval
can compare outputs from the tested notebook match those of the input notebook.
Notebooks should specify what platform they are intended for: e.g. LSP at LDF, lab, etc.
Notebooks in notebooks/prod
must be executable on the LSP at the LDF since that is where continuous integration will be carried out.
Strive to use a common style in code cells. The DM team uses the PEP-8. There are many auto-formatters out there.
The general guideline is to keep module libraries in the same repository as the notebooks which use them. However, it is obvious there will be situations where different repositories will want to share imported code. For example, the commissioning team and science validation team will certainly have overlapping analysis tasks that could make use of shared code. In these cases, we suggest have another repository of utility/library code that is a dependency of both the higher level notebook repositories. This will require coordination on code review to reduce duplication.