Skip to content

Instantly share code, notes, and snippets.

@strubell
Created December 18, 2014 00:05
Show Gist options
  • Save strubell/02dc304c169357c96563 to your computer and use it in GitHub Desktop.
Save strubell/02dc304c169357c96563 to your computer and use it in GitHub Desktop.
class ProcessSlotFillingCorpusOpts extends CmdOptions {
val dataDirs = new CmdOption[List[String]]("data-dirs", List.empty[String], "FILENAME...", "List of directories containing (only) data files in sgml format.")
val dataFiles = new CmdOption[List[String]]("data-files", List.empty[String], "FILENAME...", "List of files in sgml format.")
val dataFilesFile = new CmdOption("data-files-file", "", "FILENAME", "File containing a list of paths to data files, one per line.")
val reprocess = new CmdOption("reprocess", false, "BOOL", "Whether to re-process documents that we find to be already serialized.")
val outputDir = new CmdOption("output-dir", "", "FILENAME", "Directory to which to serialize processed docs")
val inputType = new CmdOption("input-type", "filename", "STRING", "Type of the input: filename, docid, document. Document assumes plain text serialized documents, filename is tac corpus filenames, docid is lookup into original corpus by docid.")
val retag = new CmdOption("retag-type", "none", "STRING", "Whether we simply want to re-tag already-serialized documents (files-are-docs assumed to be true), and how: slotfilling, event, parse, none")
val corpusLocation = new CmdOption("corpus-location", "", "STRING", "Path to original data files (tac corpus), used with input-type=docid")
val docIdMapLocation = new CmdOption("docid-map-location", "", "STRING", "Path to file mapping docids to offsets, original file; used with input-type=docid")
val numThreads = new CmdOption("threads", 24, "INT", "Number of threads to use.")
}
@strubell
Copy link
Author

Then later, val opts = new ProcessSlotFillingCorpusOpts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment