Skip to content

Instantly share code, notes, and snippets.

@kanaka
Last active December 30, 2015 12:18
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save kanaka/7827720 to your computer and use it in GitHub Desktop.
Save kanaka/7827720 to your computer and use it in GitHub Desktop.
bash lisp tokenizer NOTE: doesn't handle \" within strings yet
#!/bin/bash
wholefile=$(cat $1)
filelen=${#wholefile}
idx=0
chunk=0
chunksz=500
while true; do
if (( ${#str} < ( chunksz / 2) )) && (( chunk < filelen )); then
str="${str}${wholefile:${chunk}:${chunksz}}"
chunk=$(( chunk + ${chunksz} ))
fi
(( ${#str} == 0 )) && break
[[ "${str}" =~ ^(\"[^\"]+\")|^([\(\)])|^([^ \"\(\)]+)|^[[:space:]]+ ]]
match=${BASH_REMATCH[0]}
str="${str:${#match}}"
[ -n "${match# }" ] && echo "${match}"
if [ -z "${BASH_REMATCH[0]}" ]; then
echo >&2 "Error at: ${str:0:50}"
exit 2
fi
done
# much faster:
# sed 's/\("[^"][^"]*"\)\|\([()]\)\|\([^ "()][^ "()]*\)\| */<\1.\2.\3>\n/g' $1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment