Created
February 15, 2017 15:43
-
-
Save drvenabili/96832ced1bd07e9ff4d98af185923d50 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
-a abcmdef ## $mode : The mode specifies which submodules will be run. | |
-b TXT ##$texttype : What is the type of your input files? Are these IM : images, PDF : images in PDF files, TXT : plain text files, XML : an XML format, FOLIA : FoLiA XML format, TSV : a frequency file (word type -tab - frequency) | |
-z /home/sigmund/git/TICCL/ticclops/ ## $ROOTDIR : The directory where your version of the TICCL system files are located. | |
-c /home/sigmund/git/TICCL/data/int/nld/nld.aspell.dict.c20.d2.confusion ## $charconfus : A file listing the particular character confusions the system will gather word pairs for. | |
-d empty.txt ## $KHC Specify the name of the Known Historical Confusions file (if you have one). If not, create an empty file in TICCL's root directory. | |
-e xml ## $ext : The extension ending your input file names. Can be single, e.g. '.xml' or double '.folia.xml'. | |
-f 100000000 ## $artifrq : The artificial frequency. Should be higher than the highest word frequency in your input files frequency list. Typically set at: '100000000'(i.e. one hundred million). | |
-g /home/sigmund/git/TICCL/data/int/nld/nld.aspell.dict.lc.chars ## $alph : The alphabet as derived for your language on the basis of a lexicon or corpus frequency list. | |
#-i /home/sigmund/git/TICCL/data/int/nld ## $INPUTDIR : Directory where system input files such as alphabet, character confusions file and lexicon are to be found. | |
-i /home/sigmund/Desktop/Vooruit/vooruit_preprocessed ## $dir : Directory from which files to be corrected are to be read | |
-l /home/sigmund/git/TICCL/data/int/nld/nuTICCL.OldandINLlexandINLNamesAspell.v2.COL1.tsv ## $lex : the lexicon for your language | |
-L 2 ## $LD : The levenshtein limit to be imposed on word pairs collected. | |
-o /home/sigmund/Desktop/Vooruit/OUT ## $OUTPUTDIR : the directory system output files will be written to. Will contain a dir zzz/TICCL for intermediate TICCL files and a directory zzz/FOLIA where corrected FoLiA XML files will be written to. | |
-j TESTTWO ## $prefix : A prefix to begin intermediate TICCL output file names with. | |
-r 3 ## $rank : The number of TICCL correction candidates that will be output. | |
-t nld ## $lang : The language your texts are written in. | |
-u /usr/local/bin ## $tooldir : Directory that holds the C++ modules of which TICCL is composed. | |
-v 30 ## $threads : the number of threads the system is allowed to set up for processing. | |
-x 5 ## $minlength : Minimum length of word types to be examined, in characters. | |
-y 50 ## $maxlength : Maximum length of word types to be examined, in characters. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment