Skip to content

Instantly share code, notes, and snippets.

@coderofsalvation
Created January 12, 2014 21:37
Show Gist options
  • Select an option

  • Save coderofsalvation/8390901 to your computer and use it in GitHub Desktop.

Select an option

Save coderofsalvation/8390901 to your computer and use it in GitHub Desktop.
adds htmldompaths to htmlcontent to easify grepping a html-value ( fingers crossed! :D )
# adds htmldompaths to htmlcontent ( fingers crossed! :D )
# @param htmlstring (pipe)
# usage: curl "http://foo.com/bar.html" | htmldompath
# output: body: <body>
# body>a: <a href="#foo">
# body>a>h1: <h1>Some title</h1>
htmldompath(){
allowedtags="$1"; TMPFILE="/tmp/.htmldompath.$(whoami)"; path=();
# strip enters, put each tag on a line, remove whitespaces, and store into file
cat - | sed ':a;N;$!ba;s/\n/ /g' | sed 's/</\n&/g;s/ / /g' | sed '/^\s*$/d' > $TMPFILE;
while read line; do
[[ "$line" =~ "<body" ]] && bodyfound=1
[[ "$line" =~ "</body" ]] && break; # we are done
[[ ! -n $bodyfound ]] && continue; # only care for data after <body>-tag
tag="${line/</}"; tag="${tag/>*/}"; tag="${tag// */}"; tag="$(echo "$tag" | tr '[:upper:]' '[:lower:]' )"
[[ "${tag:0:1}" == '!' ]] && continue # skip nondom tags
[[ "${tag:0:2}" == 'br' ]] && continue # skip noninteresting tags
lasttag=${path[${#path[@]} - 1]}
if [[ "${tag:0:1}" == '/' ]]; then
[[ "${tag/*\//}" == "$lasttag" ]] && unset path[${#path[@]}-1] # && echo "-pathsize=${#path[@]}"
else
[[ ! "$line" =~ "/$tag" ]] && [[ ! "$line" =~ '/>' ]] && path+=("${tag/\//}") # && echo "+pathsize=${#path[@]}"
fi
pathstring="${path[*]}"; pathstring="${pathstring// />}"; printf "%-40s %s\n" "$pathstring:" "$line"
done < $TMPFILE
rm $TMPFILE
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment