Skip to content

Instantly share code, notes, and snippets.

@pgwillia
Last active November 29, 2017 14:15
Show Gist options
  • Save pgwillia/d748e854dde46fd7e1cb to your computer and use it in GitHub Desktop.
Save pgwillia/d748e854dde46fd7e1cb to your computer and use it in GitHub Desktop.
sort corpus_locations.csv | uniq -c | sort -nr > corpus_locations_count.tsv
apt-get install xpath
for file in $(find . -iname '*.xml')
do
echo $file
xpath -e '//note/text()' $file > $file.txt
done
for file in $(find . -iname '*.ner')
do
echo $file
awk -F 'START:location>|<END' '{print $2}' $file >> corpus_locations.csv
done
for file in $(find . -iname '*.txt')
do
echo $file
/root/apache-opennlp-1.5.3/bin/opennlp TokenNameFinder /root/apache-opennlp-1.5.3/bin/en-ner-location.bin < $file > $file.ner
done
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment