Skip to content

Instantly share code, notes, and snippets.

@Yu-AnChen
Last active April 13, 2023 16:47
Show Gist options
  • Save Yu-AnChen/afeba25f8a541f9dde62651370993037 to your computer and use it in GitHub Desktop.
Save Yu-AnChen/afeba25f8a541f9dde62651370993037 to your computer and use it in GitHub Desktop.

For large files, submit multiple parallel jobs of omero import

P1. find all the files to be uploaded, write the file paths to a csv (mosaics.csv), and review the list. For example, type the following in the terminal to save matching files to mosaics.csv

find /n/scratch3/users/y/yc296/216-OMS_2022MAR-2022MAR/mcmicro -type f -wholename */registration/*.ome.tif | sort >> mosaics.csv
find /n/scratch3/users/y/yc296/216-OMS_2022MAR-2022MAR/mcmicro -type f -wholename */qc/*.ome.tif | sort >> mosaics.csv

P2. follow step 1 and 3 here

P3. for each files in mosaics.csv, submit a slurm job to import the image to omero

Run the following command, replacing DATASET_ID with the ID number of your Dataset from Step P2 (step 1 in the linked gist). The omero import command will run inside a SLURM job so you can log out without interrupting the import. If the import does not complete within the 12 hour time limit, begin again from Step 2 (The --exclude=clientpath option will skip any file that was previously imported, so you don't need to worry about duplicate imports).

cat /home/yc296/project/20220929-216-OMS_2022MAR-2022MAR/mosaics.csv | xargs -I {} sbatch -p short -t 0-12 --mem 2G --wrap 'module load omero && omero import --exclude=clientpath --skip=upgrade --skip=checksum --skip=minmax -d DATASET_ID {}' && sleep 1

If all images are relatively small, to upload all the files sequentially

S1. Follow the above step P1 and P2

S2. create a yaml file pointing to the above mosaics.csv, for example

omero-bulk.yml file

---
path: "/home/yc296/project/20220929-216-OMS_2022MAR-2022MAR/mosaics.csv"

S3. bulk import

Run the following command, replacing DATASET_ID with the ID number of your Dataset from Step P2 (step 1 in the linked gist) . The omero import command will run inside a SLURM job so you can log out without interrupting the import. If the import does not complete within the 12 hour time limit, begin again from Step 2 (The --exclude=clientpath option will skip any file that was previously imported, so you don't need to worry about duplicate imports).

sbatch -p short -t 0-12 --mem 2G --wrap 'omero import --exclude=clientpath --skip=upgrade --skip=checksum --skip=minmax -d DATASET_ID --bulk omero-bulk.yml'

src: https://www.synapse.org/#!Synapse:syn26470573

Importing data to HMS OMERO

1. Identify an appropriate OMERO dataset or create a new one

Navigate to https://omero.hms.harvard.edu/ in a web browser and log in.

Within the relevant User/Group , review existing Projects. If you see a Project named for the relevant project in Experiment Tracker, click on it. Otherwise create a new Project by selecting the blue Project folder icon, replacing "Project Name" with the name of your Project as listed in Experiment Tracker.

Now create a Dataset inside your Project by selecting the green Dataset folder icon, replacing "Dataset Name" with the name for your new Dataset. For users within the LSP, the Dataset name should be named according to its Experiment ID and Title within Experiment Tracker. Note the new Dataset ID number when the Dataset is created.

2. Transfer files to scratch3 (SKIP if your image is already on O2)

Files cannot be transferred directly from /n/files (research.files.med.harvard.edu) to OMERO and must instead be transferred via scratch3. If your files are already available on scratch3, continue to the next step.

2a. Create a new destination folder on scratch3

If your files are not on scratch3 and you also need to create a destination folder on scratch3, connect to O2 replacing zz999 with your HMS ID:

ssh zz999@O2.hms.harvard.edu

SSH to scratch3 to create a destination folder. If you don't have a scratch3, you must create one. z is the first letter of your HMS ID, zz999 is your HMS ID, and FOLDER is your new destination folder:

ssh /n/scratch3/users/z/zz999
mkdir FOLDER

2b. Transfer files to scratch3

Once a destination folder has been created on scratch3, SSH to O2 file transfer servers replacing zz999 with your HMS ID:

ssh zz999@transfer.rc.hms.harvard.edu

CD to the current /n/files/ file location.

Copy files to scratch3, where IMAGE.ome.tif is the file being transferred, z is the first letter of your HMS ID, zz999 is your HMS ID, and FOLDER is the destination folder:

rsync -avP IMAGE.ome.tif /n/scratch3/users/z/zz999/FOLDER

If it succeeds, you will see output similar to the following:

sending incremental file list
IMAGE.ome.tif
45, 063, 208, 960      8%     152.66MB/s      0:53:17

Once the file(s) have finished transferring to scratch3 you will be able to transfer files to OMERO. Note: Files will only remain on scratch3 for 30 days.

3. Connect to O2 and set up your session for omero CLI

SSH to O2

ssh <ecommons>@o2.hms.harvard.edu

Launch interactive job

srun -p interactive --pty --mem 200M -c 1 -t 0-00:10 bash

After getting a compute node (for example, seeing [<ecommons>@compute-e-16-233 ~]), load omero module

module load omero

and login with your ecommons and password when prompted

omero login -t 86400 omero-app.hms.harvard.edu:4064

If it succeeds, you will see output similar to the following: Created session for <ecommons>@omero-app.hms.harvard.edu:4064. Idle timeout: 100 min. Current group: Ludwig Center at Harvard

If the "Current group" displayed on login does not match the Group in which you created the Dataset in Step 1, run the following command but substitute GROUP NAME with the exact full name of the Group:

omero sessions group 'GROUP NAME'

4. Import the images into OMERO

From O2, make sure you are in the appropriate directory for the files you are transferring, where z is the first letter of your HMS ID, zz999 is your HMS ID, and `FOLDER' is the destination folder:

cd /n/scratch3/users/z/zz999/FOLDER

Run the following command, replacing DATASET_ID with the ID number of your Dataset from Step 1 and replacing IMAGE.ome.tif with the filename to import. You may specify multiple files in a single import command if you have more than one file. Filenames containing spaces or other special characters must be surrounded with "double quotes". The omero import command will run inside a SLURM job so you can log out without interrupting the import. If the import does not complete within the 12 hour time limit, begin again from Step 2 (The --exclude=clientpath option will skip any file that was previously imported, so you don't need to worry about duplicate imports).

sbatch -p short -t 0-12 --mem 2G --wrap 'omero import --exclude=clientpath --skip=upgrade --skip=checksum --skip=minmax -d DATASET_ID IMAGE.ome.tif'

If the job is successfully submitted, you will see output similar to the following, where JOBID is the ID number used to track your job: Submitted batch job JOBID

5. Verify the import is visible in OMERO

Reload the OMERO webpage and navigate to your new dataset. Verify that all images are present.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment