Skip to content

Instantly share code, notes, and snippets.

@ns-mkusper
Last active June 6, 2022 21:24
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ns-mkusper/e15317d743f8835b9a574846bb81d31f to your computer and use it in GitHub Desktop.
Save ns-mkusper/e15317d743f8835b9a574846bb81d31f to your computer and use it in GitHub Desktop.
script for removing older files in an hdfs directory
#!/bin/bash
usage="Usage: ./remove_older_hdfs_files.sh [path] [days]"
# use if working with incredibly large directories
# export HADOOP_CLIENT_OPTS="$HADOOP_CLIENT_OPTS -Xmx5g"
if [ ! "$1" ]
then
echo $usage;
exit 1;
fi
if [ ! "$2" ]
then
echo $usage;
exit 1;
fi
now=$(date +%s)
i=1
declare -a arr
hdfs dfs -ls $1 | while read f; do
file_date=$(echo $f | awk '{print $6}')
file_name=$(echo $f | awk '{print $8}')
diff=$(( ($now - $(date -d "$file_date" +%s)) / (24 * 60 * 60) ));
if [ $diff -gt $2 ]; then
if [ $(($i % 5000)) -ne 0 ]; then
arr=(${arr[@]} $file_name)
let i++
else
hdfs dfs -rm -r -skipTrash ${arr[@]}
arr=()
i=1;
fi
fi
done
if [ $i -ne 1 ]; then
hdfs dfs -rmr -r -skipTrash ${arr[@]}
fi
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment