Skip to content

Instantly share code, notes, and snippets.

Created March 8, 2018 09:07
What would you like to do?
# specify the cluster names and don't forget the last "/" (!)
#export FULL_PATH1="hdfs://cluster1:8020/path/to/source/dir/"
#export FULL_PATH2="hdfs://cluster2:8020/target/dir/"
# count dashes in path
i1=$(( $(grep -o "$dash" <<< "$FULL_PATH1" | wc -l) + 1 ))
i2=$(( $(grep -o "$dash" <<< "$FULL_PATH2" | wc -l) + 1 ))
# dump paths and compare them.
# output indicates differing paths/files
# output is empty if no difference
hdfs dfs -ls -R $FULL_PATH1 | cut -d/ -f${i1}- | sort > /tmp/filelist1
hdfs dfs -ls -R $FULL_PATH2 | cut -d/ -f${i2}- | sort > /tmp/filelist2
diff /tmp/filelist1 /tmp/filelist2 | grep -v .staging | grep -v 1a2,3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment