Skip to content

Instantly share code, notes, and snippets.

@marrabld
Last active March 22, 2018 10:24
Show Gist options
  • Save marrabld/f2a46ae0553af947b657ba0e53ce0954 to your computer and use it in GitHub Desktop.
Save marrabld/f2a46ae0553af947b657ba0e53ce0954 to your computer and use it in GitHub Desktop.

The Challenge

Subset a list of files from NCI and transfer them to Pawsey. The directory structure needs to be preserved from NCI to Pawsey. The subset is a few hundres TB and needs to be publicly accessable.

Challenge 1.

You can't use pshell in the following way.

pshell cp <local file>  <remote location>

instead you have to run pshell in an interactive way. you launch it and run it as a programme.

Challenge 2a.

You can't use pshell in the following way

pshell put my_local_file.txt /a/b/c/

where /a/b/c/ are some remote directory

Instead you must

pshell cd /a/b/c/ && put my_local_file

Challenge 2b.

You can't use pshell in the following way

 pshell cd <remote location>
 pshell put <local file>

The second line will cause pshell to forget where you changed directory to and will put your local file into the root of your project directory

But you can chain commands together

 pshell cd <remote location> && put <local file>

Challenge 3.

You can't push a file to a remote directory if the directory doesn't exist.

You CAN make a new directory, but you CAN'T check check to see if the directory allready exists. And you can't ask pshell to create the parent directory such as mkdir -p like you can in bash.

pshell mkdir /a/b/c/d && cd /a/b/c/d && put <local file>

will fail if a b or c do not exist and will fail if d allready exists. But you CAN'T check to see if they do or not.

Challenge 4.

You CAN put a top level directory and it WILL preserve the directory structure. However, a) if running from the NCI it will try and push the entire directory which includes all of the archive and not the sub-set we require. b) if we run it from Landgate, we need to subset the data and cache it locally. b) we download the entire sub-set wich is 100s of TB or we have to pull a few files at a time, push them, then pull a few more. This meas we need to keep track of all the files that we pulled to make sure we get them all and don't duplicate the downloads. This would be fine if the downloads didn't fail all the time. Because the fail we need to do a lot of checking and error handling from the command line.

Challenge 5.

You can't just publish the top level directory. You must publish each file seperately

 pshell publish <root directory>

will fail.

You must

pshell mkdir /a/b/c/d && cd /a/b/c/d && put <local_file> && publish /a/b/c/d/<local_file>

Solution

Pawsey impliments the following functionality

pshell cp <local file> <remote folder>
AND
pshell mkdir -p
AND
pshell publish -r <root direcoty>

Challenge 6

Pawsey transfers often fail. because pshell needs to be wrapped in a bash scrpt and doesn't do any sane eorror handling, if the transer fails, the shell script keeps executing and i can't tell if the transfer was successful or not.

Solution

pshell should use stdio and stderr

Workaround (current) - non-caching

Unzip pshell, monkey patch pshell to wrap mkdir command in a try: catch: except: pass block.

zip the patched __main__.py and __mf_client__.py file into pshell.zip

Then execute the folling command

$ echo '#!/usr/bin/env python' | cat - pshell.zip > pshell

and run the following script

for str in $(cat ./file_list.txt);do var=$(echo $str | sed 's,/g/data3/fj7/Copernicus/,,g' | awk 'BEGIN{FS="/"; strA=""; strB"";}{ for (i=1;i<NF;i++){echo $i; strA=strA" mkdir "strB""$i" && " ; strB=strB""$i"/";}}END{print strA, " cd /projects/WACopernicus/"strB , " && put "}' | xargs -I% echo "cd /projects/WACopernicus && " % $str " && publish " $(echo $str | sed 's,/g/data3/fj7/Copernicus/,/projects/WACopernicus/,g') | xargs -I% echo \"%\" ); echo "python pshell" $var;done > bigList.txt

This will generate a file bigList.txt which contains the reuqired pshell commands

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment