Skip to content

Instantly share code, notes, and snippets.

@surhudm
Last active November 26, 2021 10:26
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save surhudm/d89d0a30fc72f06ba8428de9a3095d11 to your computer and use it in GitHub Desktop.
Save surhudm/d89d0a30fc72f06ba8428de9a3095d11 to your computer and use it in GitHub Desktop.
LSST pipeline setup

Here I describe my experience in setting up the LSST pipeline Gen3 butler on an IUCAA server.

Set up the LSST stack

mkdir -p lsst_stack
cd lsst_stack
curl -OL https://raw.githubusercontent.com/lsst/lsst/master/scripts/newinstall.sh
bash newinstall.sh -ct
source loadLSST.bash
eups distrib install -t v23_0_0_rc2
curl -sSL https://raw.githubusercontent.com/lsst/shebangtron/master/shebangtron | python
setup lsst_distrib

Set up postgresql

The LSST gen3 pipeline requires a database to be set up. This can be either done with sqlite3 (but may not be suited for heavy processing). Sqlite3 database creation is as simple as creating just an empty file with that name. But here I describe the setup of a postgresql server. This does not require any root password.

./configure --prefix=$HOME
make -j 20
make install
cd contrib
make install
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH=$HOME/lib

Now initialize the database, change the ~/gen3_db to the location you want your database to reside

$HOME/bin/initdb -D ~/gen3_db

Open the file ~/gen3_db/postgresql.conf, and change listen addresses to the appropriate address that you need to listen from.

listen_addresses = '*' 
max_connections = 1200

In pg_hba.conf, add the following line (assumes your infiniband network is on 192.168.1.XXX addresses, otherwise change appropriately.

host    all            all           192.168.1.0/24       md5

Now start the postgresql server:

$HOME/bin/pg_ctl -D ~/gen3_db -l logfile start

First create the database location, then open it in psql, add a btree_gist extension and also add a password:

createdb gen3
~/bin/psql gen3
gen3=# CREATE EXTENSION btree_gist
gen3=# \password

Now that this has been setup, you can change all the trust authentication in ~/gen3_db/pg_hba.conf to md5. This way all access will now be password based. You can setup the password using the environment variable $PGPASSWORD or write it in clear text in a $HOME/.pgpass file.

export PGPASSWORD=YourPassword

Create a gen3 repository

Now let us create a space for the gen3 repository.

mkdir $HOME/gen3_repo

Setup butler and register the instrument (in our case Subaru HSC).

echo "registry:" > reg.yaml
echo "  db: postgresql://username@server_ip_address/gen3" >> reg.yaml

DIR=$HOME/gen3_repo
butler create $DIR --seed-config reg_2018.yaml --override
butler register-instrument $DIR lsst.obs.subaru.HyperSuprimeCam

Data finally!

If you have access to gen3-shared-repo-admin tools, then skip this and go down one section:

Let us ingest some raw data from the directory $HOME/Subaru_rawdata now. Depending upon the size of your data, this can take a really loooooooooooong time.

butler --progress ingest-raws $DIR $HOME/Subaru_rawdata -t direct 2>&1 > rawingest.log &

Define each exposure as a single visit using the next command:

butler define-visits $DIR HSC

Since you have ingested the raws, now we do not have to ingest the raws once again. So create a file called skipraws.py.

To this file add,

# skipraws.py file
import lsst.obs.base.gen2to3.convertRepo
  
assert type(config)==lsst.obs.base.gen2to3.convertRepo.ConvertRepoConfig, 'config is of type %s.%s instead of lsst.obs.base.gen2to3.convertRepo.ConvertRepoConfig' % (type(config).__module__, type(config).__name__)

config.datasetIgnorePatterns=["raw"]
config.doMakeUmbrellaCollection=False
config.doExpandDataIds=False

If you have a gen2 root directory which has your previous processing say from HSCpipe, then you can utilize it here and ingest the skymaps, reference catalogs, calibrations with the next command:

GEN2ROOT=$HOME/gen2root
butler --progress convert $DIR --gen2root $GEN2ROOT -C skipraws.py -t direct 2>&1 > gen2convert.log &

If you do not have one, then you download the calibration data from https://www.subarutelescope.org/Observing/Instruments/HSC/calib_data.html. You need to then create a gen2 repo and ingest these calibrations into the gen2 repository first following instructions at https://hsc.mtk.nao.ac.jp/pipedoc/pipedoc_8_e/ . You need to specifically initialize the repository, use ingestRaws to ingest a couple of exposures. Then ingest all the calibrations CALIB, SKY, FLAT, BIAS, DARK following the procedure written there. Once you have all this, then you can get your skyamps, refcats and calibrations from this gen2 repository using the command above.

After this you can also inherit any of your reruns one after the other. For more complicated rerun ingestion you should take a look at the script convert.py available here https://github.com/lsst/obs_base/blob/master/python/lsst/obs/base/script/convert.py and play around with it.

Advanced repositories: gen3 shared repo admin tools

Next we can use some butler-admin tools to do some ingestion of data in to this repository. The repository however is invitation only at this moment.

mkdir $HOME/github
cd $HOME/github
git clone git@github.com:lsst-dm/gen3_shared_repo_admin.git
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment