Skip to content

Instantly share code, notes, and snippets.

@universvm
Last active January 8, 2021 11:05
Show Gist options
  • Save universvm/ba3c848b960f4035cab3d1e8f9e4b9b8 to your computer and use it in GitHub Desktop.
Save universvm/ba3c848b960f4035cab3d1e8f9e4b9b8 to your computer and use it in GitHub Desktop.
Leaving a long job running on CDT Machines

Preface

If you need to leave a long job, such as a program that will run for longer than 5 hrs, or a jupyter notebook you might have noticed errors such as "Broken Pipe" or about your access to AFS being expired.

rm: cannot remove '/afs/inf.ed.ac.uk/user/s20/UUN/.last_login': Permission denied
-bash: /afs/inf.ed.ac.uk/user/s20/UUN/.last_login: Permission denied
-bash: cd: /afs/inf.ed.ac.uk/user/s20/UUN: Permission denied
-bash: /afs/inf.ed.ac.uk/user/s20/UUN/.bash_profile: Permission denied

Solution

To solve this, we need to first start a tmux session, request tokens and then run the command with longjob.

tmux 
conda activate {YOUR PYTHON ENVIRONMENT WITH JUPYTER LAB}
kinit && aklog
longjob -28day -c "jupyter lab --no-browser"

Debugging

If for some reasons you are getting errors such as

Waiting for job to start...
krenew: unable to run command (nohup: No such file or directory
krenew: error reading ticket cache: No credentials cache found (filename: /tmp/{SOME NUMBERS})
krenew: cannot destroy ticket cache: No credentials cache found (filename: /tmp/{SOME NUMBERS})

In my experience you might have misspelled something in the command. If this is not the case, try

kdestroy

multiple times until you get an error that there's nothing to destroy.

Reference

https://gist.github.com/goweiting/6b11f4ef3e18188d04800b2b2970977f

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment