Skip to content

Instantly share code, notes, and snippets.

@Condla
Created March 8, 2018 09:07
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Condla/dfb49df89ac675d6808fa00e5be78589 to your computer and use it in GitHub Desktop.
Save Condla/dfb49df89ac675d6808fa00e5be78589 to your computer and use it in GitHub Desktop.
#!/bin/bash
# specify the cluster names and don't forget the last "/" (!)
#export FULL_PATH1="hdfs://cluster1:8020/path/to/source/dir/"
#export FULL_PATH2="hdfs://cluster2:8020/target/dir/"
# count dashes in path
dash="/"
i1=$(( $(grep -o "$dash" <<< "$FULL_PATH1" | wc -l) + 1 ))
i2=$(( $(grep -o "$dash" <<< "$FULL_PATH2" | wc -l) + 1 ))
# dump paths and compare them.
# output indicates differing paths/files
# output is empty if no difference
hdfs dfs -ls -R $FULL_PATH1 | cut -d/ -f${i1}- | sort > /tmp/filelist1
hdfs dfs -ls -R $FULL_PATH2 | cut -d/ -f${i2}- | sort > /tmp/filelist2
diff /tmp/filelist1 /tmp/filelist2 | grep -v .staging | grep -v 1a2,3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment