- Subsection repositories based on datasets and data IDs.
- Given this list of data IDs I need a coherent self-consistent standalone repo of PVIs and deep coadds.
- This means the butler will need to infer
dataId
s for particular datasets based on thedataId
s for others. E.g. infer coaddtract
andpatch
from PVIvisit
andccdnum
.
- This means the butler will need to infer
- Given this list of data IDs I need a coherent self-consistent standalone repo of PVIs and deep coadds.
- We need a mechanism for discovering data based on multiple axes (good seeing, bad seeing, time based, area of sky).
- As an example, image characterization pipeline publishes relevant results (seeing) to DBB, then later pipelines should be able to query based upon it.
- I/O plugin needs to be configurable on a dataset basis.
- The example is that a user may want intermediate files to go to the local POSIX system, but may want end products to go to a more permanent storage location.
- Datasets need to be easy to define. Every dataset need not implement persistence for every storage context (why would you write a RDMS storage engine for
ImageF
?), but implementing persistence for "reasonable" storage contexts should be straightforward.- Examples from SQuaRE would be
lsst.vderify.Measurement
orlsst.verify.Job
.
- Examples from SQuaRE would be
- Another use case from the QA world, it would be nice to be able to specify datasets based on SQL (or ADQL) queries. If we have fixed schemas, it would allow us to say
Butler.get('high_snr_stars')
rather thanButler.get('select * from src where psfFlux/psfFlux_err > 100 and isStar == 1')
. - Sometimes we have outputs from QA that are related to measurements of metrics, but maybe not specific LSST classes (e.g. a whisker diagram of the PSF ellipticity as a function of focal plane position). Do we care about being able to
Butler.put
binary blobs like that? - We will want to access the same datasets from multiple runs using different code/configuration.
- Will sqlite dbs for the registry scale?
- It's clear that having an I/O module that can interact with an object store (e.g. S3). This means that the design should be flexible relative to the limitations to the various systems: e.g. lack of atomic consistency on object stores, inode contention on GPFS.
What do you mean by "It's clear that having an I/O module that can interact with an object store (e.g. S3)."? I think you may have left out a few words?