Skip to content

Instantly share code, notes, and snippets.

@Zejnilovic
Created August 17, 2020 16:30
Show Gist options
  • Save Zejnilovic/7a2690ad3297eba76dcbf6502cac2f61 to your computer and use it in GitHub Desktop.
Save Zejnilovic/7a2690ad3297eba76dcbf6502cac2f61 to your computer and use it in GitHub Desktop.
Remove old files from Hadoop tmp
today=`date +'%s'` # date today
files=`hdfs dfs -ls /tmp | tail -n +2` # all files in tmp
granularity=$(( 24*60*60 )) # granularity of time. Now set to days
olderThan=7 # granularity times olderThan gives you what age files should be deleted
for line in $files; do
dir_date=$(echo ${line} | awk '{print $6}')
# difference=$(( ( ${today} - $(date -j -u -f "%Y-%m-%d %H:%M" ${dir_date} +%s) ) / ${granularity} )) # MacOS
difference=$(( ( ${today} - $(date -d ${dir_date} +%s) ) / ${granularity} )) # Linux
filePath=$(echo ${line} | awk '{print $8}')
if ([ "${difference}" -gt $olderThan ]); then
# hadoop fs -rm -r ${filePath}
echo $filePath
fi
done
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment