Last active
November 25, 2015 16:17
-
-
Save peaeater/66a900fdda493733f417 to your computer and use it in GitHub Desktop.
Takes a text input file and by default, produces a tab-delimited csv output file. Output columns do not have a header row, but are always arranged the same way in three columns.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<# | |
Requires Stanford NER, Java 1.8+ | |
formats = slashTags, inlineXML, xml, tsv, tabbedEntities | |
#> | |
param( | |
[Parameter(Mandatory=$true,Position=0)] | |
[string]$file, | |
[Parameter(Mandatory=$true,Position=1)] | |
[string]$outfile, | |
[string]$format = "tabbedEntities", | |
[string]$model = "3class", | |
[string]$jardir = "..\stanford-ner" | |
) | |
$classifier = switch ($model) { | |
3class {"english.all.3class.distsim.crf.ser.gz"; break} | |
4class {"english.conll.4class.distsim.crf.ser.gz"; break} | |
7class {"english.muc.7class.distsim.crf.ser.gz"; break} | |
default {"english.all.3class.distsim.crf.ser.gz"; break} | |
} | |
java -mx1500m -cp (join-path $jardir stanford-ner.jar) edu.stanford.nlp.ie.crf.CRFClassifier -loadClassifier (join-path $jardir classifiers\$classifier) -outputFormat $format -textFile $file > $outfile |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment