Skip to content

Instantly share code, notes, and snippets.

@parejkoj
Last active January 10, 2020 19:27
Show Gist options
  • Save parejkoj/562bc9eae242d2dc54aea4148972a55f to your computer and use it in GitHub Desktop.
Save parejkoj/562bc9eae242d2dc54aea4148972a55f to your computer and use it in GitHub Desktop.

New proposal for handling calibrations in gen3

Definitions

Some definitions, since precision in this terminology can matter a lot.

  • ingest: move/copy/symlink files from outside of a butler repository and butler.ingest them to a butler repository adding Dimension data to the registry as necessary.
  • put: write in-memory objects to a butler repository using butler.put: this is distinct from ingest because the butler handles all registry updates and the objects have to be in memory, not as on-disk files.
  • export/import: read data from one butler repository and put it into a different butler repository.
  • raw: an un-modified image from a camera (including bias, dark, flat, science, etc.).
  • calibrations: data that is used to convert raw science images to processed images (includes master bias/dark/flat images, defect lists, linearity corrections, crosstalk corrections, brighter-fatter kernels).

Proposal

The only ingest that exists will be lsst.obs.base.RawIngestTask. Once raw files are ingested into a butler, cp_pipe is used to generate the calibrations that are used for processing.

If a user does not want to run cp_pipe to produce calibrations themselves (due to disk space or time limitations), they will export the desired calibrations from an existing butler repository and import them into their own. This is the only supported system by which calibrations can get into a repository without running cp_pipe.

To help us transition from gen2, we will provide a version of the gen2convert.py script to get create working gen3 repositories for the existing testdata that have externally-supplied calibration data. This is not a long term solution, but only exists to maintain our existing testdata.

Known issues

butler.export/import support

butler.export is not fully supported currently. According to @JimBosch these exports currently cannot go into a repository in a different directory. There is no ticket for this at present.

Externally-supplied calibrations

This approach does not allow us to ingest externally-supplied calibrations. For existing test datasets (e.g. ap_verify), the work-around would be to make the existing gen2convert code work on any gen2 repo (not just HSC), and use that to bootstrap gen3 repos with existing external master calibration data.

User-adjustments to master calibrations

This proposal does not support user-tweaked master calibrations. For example, someone notices a chip defect that isn't caught by cp_pipe and wants to add it to the defect list. cp_pipe would have to be updated to correctly identify that defect.

It's not clear to me what the current system for dealing with this is. What is the source of truth, the obs_*_data git repo or some particular on-disk butler somewhere? This proposal sidesteps that issue by having one source of truth: cp_pipe.

obs_*_data

The obs_*_data packages would become either obsolete and removed, or would have to contain butler.exported repository subsets instead of "user-curated text files". If we do want a place where users can download master calibrations supported by LSST, we could use these for that purpose.

Camera geometry

Dealing with the Camera Geometry is a fundamental part of this question: linearity, crosstalk, gains, overscan size could each change over time. Potentially anything in the Amplifier object could change over time other than the rawBBox/rawDataBBox.

However, I don't think that decisions about what to do with Camera Geometry affect whether this proposal is workable: we don't currently have the ability to version camera parts of the camera geometry, and updates to those values should come from DM code (e.g. cp_pipe, jointcal, fgcm), which is consistent with this overall proposal (said code will read from a butler, and write back to it).

@mfisherlevine
Copy link

Thanks @parejkoj, this is great. Comments on a first reading:

Definitions
Like the definitions for ingest, export/import and raw 👍 One minor point on the definition of calibrations: you say that these include "raw and master bias/dark/flats" but I'd slightly disagree. I'd say that the raw biases/darks/flats are not themselves calibrations (as defined in "raw" above) but are only raw inputs to the calibration products pipeline, and that it's only the outputs of that pipeline which should be considered calibrations. I wouldn't have mentioned this if this section didn't have precision specifically called out at the start though, but I think that without that change, a raw bias is both a raw and a calibration, and that seems undesirable to me, and also would be a caveat to things that follow (like all calibrations being created by cp_pipe, for example).

User-adjustments to master calibrations:
I think that there might need be a place for somewhat hand-crafted calibrations (for example, and I'm just spitballing here, but e.g. the outputs of FGCM might always need to be made/tweaked by hand, and then "ingested", or filter transmission maps, or something like that). However, I don't think this necessarily need to break the model suggested: the "ingest" task for those would just be a task in cp_pipe that takes quote-unquote "raw" input (which is actually far from raw, i.e. it's the output of whatever handcrafted analysis made the results), and it then quote-unquote "processes this data and puts it", i.e. it reads it in, however that looks, and then puts() it, making it a de-facto ingest task, but could technically be considered in keeping with the rest of this model. Does that sound like a good approach to getting around these edge cases? (This would also be what I was thinking about when we were talking about the camera-team-supplied QE curves, for example).

@parejkoj
Copy link
Author

To your last point: I suppose it does help answer the question of "where might 'ingest of externally-supplied or user modified calibrations' live?" The answer is always cp_pipe (right now, such things are scattered between pipe_tasks and obs_base). FGCM output already is "butler putted" I believe; a "blessing" step for it might be valuable.

@mfisherlevine
Copy link

Oh, sure, FGCM was just a wild guess as to something that might be like that, but I think, as I continued typing, that the QE curves are a much better one.

@timj
Copy link

timj commented Dec 10, 2019

Spelling of Jim's name.

I think the plan is to retire daf_persistence but I think gen2convert depends on daf_persistence so if the long term plan really is to keep gen2 repos around for the long term then gen2convert might need to pull the critical pieces out of daf_persistence that it needs.

I don't understand why we need to get rid of obs_*_data packages. We designed them specifically to make gen3 easier. The entire point is that these are text files that are easy to edit and track changes so I don't understand how a gen3 butler export works here since the butler versions are the binary files that we explicitly stopped using in the _data packages.

Re camera I would much prefer it if camera was detector layout in the focal plane and maybe the geometry and things like gains were tracked in separate objects that had their own distinct life spans.

Simon has dealt with the camera team QE curves already. They are not what is ingested into the repository (and I still can't quite understand why we thought pickle files with multiple sequential writes in them was a good interface).

@mfisherlevine
Copy link

Re camera I would much prefer it if camera was detector layout in the focal plane and maybe the geometry and things like gains were tracked in separate objects that had their own distinct life spans.

Yes, 100%! They're no different to all the other things which have nominal values and which we remeasure, and should be able to be treated as such.

@mfisherlevine
Copy link

and I still can't quite understand why we thought pickle files with multiple sequential writes in them was a good interface

I imagine the answer to this part is simply that they will barely ever be used, and so doing whatever is quickest and easiest was the right call. I don't think we should ever have had to do that work in the first place, from what I gleaned from a few chats here and there, so I think just getting it done is more than enough.

@kfindeisen
Copy link

kfindeisen commented Jan 10, 2020

Once raw files are ingested into a butler, cp_pipe is used to generate the calibrations that are used for processing. If a user does not want to run cp_pipe to produce calibrations themselves (due to disk space or time limitations), they will export the desired calibrations from an existing butler repository and import them into their own.

So what does this mean for operations? Do we export/import calibrations (and templates) before running the prompt products pipeline? Do we do this at the start of the night, or on every readout?

For existing test datasets (e.g. ap_verify), the work-around would be to make the existing gen2convert code work on any gen2 repo (not just HSC), and use that to bootstrap gen3 repos with existing external master calibration data.

I think it's more important that the ap_verify input resemble the operations environment (hence my previous question) than that it resemble the Gen 2 ap_verify dataset framework.

@parejkoj
Copy link
Author

For prompt products, I would expect that the PP butler repo would have the latest "blessed" versions of the calibrations produced by cp_pipe. They could have gotten there via the standard cp_pipe approach+blessing, or by export/import from some "master" butler repo. This would be done whenever master calibrations are updated (which is probably weekly to monthly-ish).

ap_verify would resemble the operations environment under the above scheme: we never run cp_pipe as part of ap_pipe, we just have a set of "blessed" master calibrations in the repo available for ap_pipe to use (either from an earlier run of cp_pipe in that repo, or from an export/import from another repo). Having a skeleton gen3 repo with master calibrations in it to export/import from to bootstrap the process for ap_verify seems like the right approach to me: it gives you something that looks like a typical ap_pipe starting point (a gen3 butler repo with master calibrations in it).

@mfisherlevine
Copy link

Apparently you can't 👍 a comment on a gist, so consider this a 👍...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment