SimonKrughoff/SciPlat_usecases.md Secret

## SciPlat_usecases.md

      
    Raw
  

              SciPlat_usecases.md
            
          
    Use cases for Science Platfrom


Subsection repositories based on datasets and data IDs.

Given this list of data IDs I need a coherent self-consistent standalone repo of PVIs and deep coadds.

This means the butler will need to infer dataIds for particular datasets based on the dataIds for others.  E.g. infer coadd tract and patch from PVI visit and ccdnum.


We need a mechanism for discovering data based on multiple axes (good seeing, bad seeing, time based, area of sky).

As an example, image characterization pipeline publishes relevant results (seeing) to DBB, then later pipelines should be able to query based upon it.


I/O plugin needs to be configurable on a dataset basis.

The example is that a user may want intermediate files to go to the local POSIX system, but may want end products to go to a more permanent storage location.


Datasets need to be easy to define.  Every dataset need not implement persistence for every storage context (why would you write a RDMS storage engine for ImageF?), but implementing persistence for "reasonable" storage contexts should be straightforward.

Examples from SQuaRE would be lsst.vderify.Measurement or lsst.verify.Job.


Another use case from the QA world, it would be nice to be able to specify datasets based on SQL (or ADQL) queries.  If we have fixed schemas, it would allow us to say Butler.get('high_snr_stars') rather than Butler.get('select * from src where psfFlux/psfFlux_err > 100 and isStar == 1').
Sometimes we have outputs from QA that are related to measurements of metrics, but maybe not specific LSST classes (e.g. a whisker diagram of the PSF ellipticity as a function of focal plane position).  Do we care about being able to Butler.put binary blobs like that?
We will want to access the same datasets from multiple runs using different code/configuration.

Implementation concerns


Will sqlite dbs for the registry scale?
It's clear that having an I/O module that can interact with an object store (e.g. S3).  This means that the design should be flexible relative to the limitations to the various systems: e.g. lack of atomic consistency on object stores, inode contention on GPFS.