Skip to content

Instantly share code, notes, and snippets.

@peaeater
Last active November 25, 2015 16:17
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save peaeater/66a900fdda493733f417 to your computer and use it in GitHub Desktop.
Save peaeater/66a900fdda493733f417 to your computer and use it in GitHub Desktop.
Takes a text input file and by default, produces a tab-delimited csv output file. Output columns do not have a header row, but are always arranged the same way in three columns.
<#
Requires Stanford NER, Java 1.8+
formats = slashTags, inlineXML, xml, tsv, tabbedEntities
#>
param(
[Parameter(Mandatory=$true,Position=0)]
[string]$file,
[Parameter(Mandatory=$true,Position=1)]
[string]$outfile,
[string]$format = "tabbedEntities",
[string]$model = "3class",
[string]$jardir = "..\stanford-ner"
)
$classifier = switch ($model) {
3class {"english.all.3class.distsim.crf.ser.gz"; break}
4class {"english.conll.4class.distsim.crf.ser.gz"; break}
7class {"english.muc.7class.distsim.crf.ser.gz"; break}
default {"english.all.3class.distsim.crf.ser.gz"; break}
}
java -mx1500m -cp (join-path $jardir stanford-ner.jar) edu.stanford.nlp.ie.crf.CRFClassifier -loadClassifier (join-path $jardir classifiers\$classifier) -outputFormat $format -textFile $file > $outfile
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment