Skip to content

Instantly share code, notes, and snippets.

@shiumachi
Created November 28, 2018 06:28
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save shiumachi/6380807768b1d9f6658b320d8cc16a75 to your computer and use it in GitHub Desktop.
Save shiumachi/6380807768b1d9f6658b320d8cc16a75 to your computer and use it in GitHub Desktop.
#!/bin/bash
usage()
{
echo "hadoop-logaggr.sh [-h] file" >&2
echo " -h: help (this message)" >&2
exit 0
}
TEMP=`getopt h $*`
if [ $? != 0 ] ; then
usage
fi
eval set -- "$TEMP"
while true ; do
case "$1" in
-h|--help)
usage
shift ;;
--) shift ; break ;;
*) break ;;
esac
done
if [ $# != 1 ] ; then
usage
fi
# 1st sed: delete datetime like YYYY-MM-DD hh:mm:ss
# 2nd sed: replace all sessionid 0xXXXXX to <session id>
# 3rd sed: replace name of block files blk_XXXXXXXXXX and blk_XXXXXXXXXX_XXXX to <block id>
# 4th sed: replace task id and attempt id (attempt_XXX_m_XXX_X to <attempt id>, for example
# 5th sed: replace 32byte hash "aabbcc112233..." to <32byte hash>
# 6th sed: replace timestamp "13000000000" to <timestamp>
cat $1 \
| sed "s/^[0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\} [0-9]\{2\}:[0-9]\{2\}:[0-9]\{2\},[0-9]\{3\} //" \
| sed "s/0x[0-9A-Fa-f]\{15\}/<session id>/" \
| sed -E "s/blk_-{0,1}[0-9]{1,19}(.*|_[0-9]{5})/<block id>/g" \
| sed -E "s/(job|task|attempt)_[0-9]{12}_[0-9]{1,5}(.*|_[mr]_[0-9]{6}(.*|_[0-9]))/<\1 id>/" \
| sed -E "s/[0-9a-f]{32}/<32byte hash>/g" \
| sed -E "s/[0-9]{13}/<timestamp>/g" \
| sort | uniq -c | sort -rnk1 | less
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment