Skip to content

Instantly share code, notes, and snippets.

@damionjunk
Created November 6, 2012 19:13
Show Gist options
  • Save damionjunk/4026818 to your computer and use it in GitHub Desktop.
Save damionjunk/4026818 to your computer and use it in GitHub Desktop.
Generate a gold standard file and text extraction for evalb / Assignment 8
(ns ptreader
(:require [clojure.java.io :as io]
[clojure.string :as s])
(:import [edu.stanford.nlp.ling Word WordTag]
[edu.stanford.nlp.trees PennTreeReader]))
(defn get-sentence
[words]
(s/join " "(map (fn [w] (.word w)) words)))
(comment
(let [fp "/Users/djunk/projects/L645/hw08/wsj_all.mrg"
ptr (PennTreeReader. (io/reader fp))
tree (.readTree ptr)]
(with-open [gold (io/writer "/Users/djunk/projects/L645/hw08/wsj00-gold.mrg")
text (io/writer "/Users/djunk/projects/L645/hw08/wsj00-text.txt")]
(loop [tree (.readTree ptr)]
(if (not (nil? tree))
(let [psgold (.toString tree)
words (.yieldWords tree)]
(.write text (str (get-sentence words) "\n"))
(.write gold (str psgold "\n" ))
(recur (.readTree ptr)))
))))
)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment