Skip to content

Instantly share code, notes, and snippets.

@drvenabili
Created February 15, 2017 15:43
Show Gist options
  • Save drvenabili/96832ced1bd07e9ff4d98af185923d50 to your computer and use it in GitHub Desktop.
Save drvenabili/96832ced1bd07e9ff4d98af185923d50 to your computer and use it in GitHub Desktop.
-a abcmdef ## $mode : The mode specifies which submodules will be run.
-b TXT ##$texttype : What is the type of your input files? Are these IM : images, PDF : images in PDF files, TXT : plain text files, XML : an XML format, FOLIA : FoLiA XML format, TSV : a frequency file (word type -tab - frequency)
-z /home/sigmund/git/TICCL/ticclops/ ## $ROOTDIR : The directory where your version of the TICCL system files are located.
-c /home/sigmund/git/TICCL/data/int/nld/nld.aspell.dict.c20.d2.confusion ## $charconfus : A file listing the particular character confusions the system will gather word pairs for.
-d empty.txt ## $KHC Specify the name of the Known Historical Confusions file (if you have one). If not, create an empty file in TICCL's root directory.
-e xml ## $ext : The extension ending your input file names. Can be single, e.g. '.xml' or double '.folia.xml'.
-f 100000000 ## $artifrq : The artificial frequency. Should be higher than the highest word frequency in your input files frequency list. Typically set at: '100000000'(i.e. one hundred million).
-g /home/sigmund/git/TICCL/data/int/nld/nld.aspell.dict.lc.chars ## $alph : The alphabet as derived for your language on the basis of a lexicon or corpus frequency list.
#-i /home/sigmund/git/TICCL/data/int/nld ## $INPUTDIR : Directory where system input files such as alphabet, character confusions file and lexicon are to be found.
-i /home/sigmund/Desktop/Vooruit/vooruit_preprocessed ## $dir : Directory from which files to be corrected are to be read
-l /home/sigmund/git/TICCL/data/int/nld/nuTICCL.OldandINLlexandINLNamesAspell.v2.COL1.tsv ## $lex : the lexicon for your language
-L 2 ## $LD : The levenshtein limit to be imposed on word pairs collected.
-o /home/sigmund/Desktop/Vooruit/OUT ## $OUTPUTDIR : the directory system output files will be written to. Will contain a dir zzz/TICCL for intermediate TICCL files and a directory zzz/FOLIA where corrected FoLiA XML files will be written to.
-j TESTTWO ## $prefix : A prefix to begin intermediate TICCL output file names with.
-r 3 ## $rank : The number of TICCL correction candidates that will be output.
-t nld ## $lang : The language your texts are written in.
-u /usr/local/bin ## $tooldir : Directory that holds the C++ modules of which TICCL is composed.
-v 30 ## $threads : the number of threads the system is allowed to set up for processing.
-x 5 ## $minlength : Minimum length of word types to be examined, in characters.
-y 50 ## $maxlength : Maximum length of word types to be examined, in characters.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment