Some definitions, since precision in this terminology can matter a lot.
- ingest: move/copy/symlink files from outside of a butler repository and
butler.ingest
them to a butler repository addingDimension
data to theregistry
as necessary. - put: write in-memory objects to a butler repository using
butler.put
: this is distinct fromingest
because the butler handles allregistry
updates and the objects have to be in memory, not as on-disk files. - export/import: read data from one butler repository and put it into a different butler repository.
- raw: an un-modified image from a camera (including bias, dark, flat, science, etc.).
- calibrations: data that is used to convert raw science images to processed images (includes master bias/dark/flat images, defect lists, linearity corrections, crosstalk corrections, brighter-fatter kernels).
The only ingest that exists will be lsst.obs.base.RawIngestTask
. Once raw files are ingested into a butler, cp_pipe
is used to generate the calibrations that are used for processing.
If a user does not want to run cp_pipe
to produce calibrations themselves (due to disk space or time limitations), they will export the desired calibrations from an existing butler repository and import them into their own. This is the only supported system by which calibrations can get into a repository without running cp_pipe
.
To help us transition from gen2, we will provide a version of the gen2convert.py
script to get create working gen3 repositories for the existing testdata that have externally-supplied calibration data. This is not a long term solution, but only exists to maintain our existing testdata.
butler.export
is not fully supported currently. According to @JimBosch these exports currently cannot go into a repository in a different directory. There is no ticket for this at present.
This approach does not allow us to ingest externally-supplied calibrations. For existing test datasets (e.g. ap_verify
), the work-around would be to make the existing gen2convert
code work on any gen2 repo (not just HSC), and use that to bootstrap gen3 repos with existing external master calibration data.
This proposal does not support user-tweaked master calibrations. For example, someone notices a chip defect that isn't caught by cp_pipe
and wants to add it to the defect list. cp_pipe
would have to be updated to correctly identify that defect.
It's not clear to me what the current system for dealing with this is. What is the source of truth, the obs_*_data
git repo or some particular on-disk butler somewhere? This proposal sidesteps that issue by having one source of truth: cp_pipe
.
The obs_*_data
packages would become either obsolete and removed, or would have to contain butler.export
ed repository subsets instead of "user-curated text files". If we do want a place where users can download master calibrations supported by LSST, we could use these for that purpose.
Dealing with the Camera Geometry is a fundamental part of this question: linearity, crosstalk, gains, overscan size could each change over time. Potentially anything in the Amplifier object could change over time other than the rawBBox/rawDataBBox.
However, I don't think that decisions about what to do with Camera Geometry affect whether this proposal is workable: we don't currently have the ability to version camera parts of the camera geometry, and updates to those values should come from DM code (e.g. cp_pipe
, jointcal
, fgcm
), which is consistent with this overall proposal (said code will read from a butler, and write back to it).
Spelling of Jim's name.
I think the plan is to retire daf_persistence but I think gen2convert depends on daf_persistence so if the long term plan really is to keep gen2 repos around for the long term then gen2convert might need to pull the critical pieces out of daf_persistence that it needs.
I don't understand why we need to get rid of obs_*_data packages. We designed them specifically to make gen3 easier. The entire point is that these are text files that are easy to edit and track changes so I don't understand how a gen3 butler export works here since the butler versions are the binary files that we explicitly stopped using in the _data packages.
Re camera I would much prefer it if camera was detector layout in the focal plane and maybe the geometry and things like gains were tracked in separate objects that had their own distinct life spans.
Simon has dealt with the camera team QE curves already. They are not what is ingested into the repository (and I still can't quite understand why we thought pickle files with multiple sequential writes in them was a good interface).