Skip to content

Instantly share code, notes, and snippets.

@eiriks
Forked from ljos/OBT-stemmer.sh
Last active August 29, 2015 14:17
Show Gist options
  • Save eiriks/a1f5f878bf7af74211ae to your computer and use it in GitHub Desktop.
Save eiriks/a1f5f878bf7af74211ae to your computer and use it in GitHub Desktop.
#!/usr/bin/env bash
sed '/^\s*$/d' \
| paste -d '\t\0' - - - \
| sed -e 's/\([^"]*\)$/\t\1/' \
-e 's,<word>\(.*\)</word>,\1,' \
-e 's/"<\(.*\)>"\t"\(.*\)"/\1\t\2/' \
| cut -f3 \
| sed 's/./\L\0/g'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment