It behaves identically asrm
except that it won't delete files with important
or .keep
(customizable through SAFE_RM_IMPORTANCE_MARK
). The idea is to mark important files beforehand so they can always survive future disk cleanup.
What triggered me to create this is I accidentally removed some baseline models, which I spent lots of GPU hours to train and all my current training jobs depend on. To prevent future disasters, I come up with this solution. Hope you find it useful too.
> tree . # before deletion
├── checkpoint
│ ├── baseline
│ │ ├── log.txt
│ │ └── model.pt
│ ├── finetuned
│ │ ├── log.txt
│ │ └── model.pt
│ ├── log.txt
│ └── note
│ ├── _record
│ │ ├── __init__.py
│ │ └── important_file
│ ├── log1.txt
│ └── log2.txt
├── main.cc
├── model.pt
└── note.keep
> rm -rf ./* # deletion
skip ./checkpoint/note/_record/important_file
skip ./checkpoint/baseline/model.pt
skip ./checkpoint/baseline/log.txt
skip ./note.keep
> tree . # after deletion, important files are preserved
├── checkpoint
│ ├── baseline
│ │ ├── log.txt
│ │ └── model.pt
│ └── note
│ └── _record
│ └── important_file
└── note.keep