Skip to content

Instantly share code, notes, and snippets.

@NetBUG
Created May 29, 2017 10:29
Show Gist options
  • Save NetBUG/c6f4b937b21adcad6193a9ebc3d43421 to your computer and use it in GitHub Desktop.
Save NetBUG/c6f4b937b21adcad6193a9ebc3d43421 to your computer and use it in GitHub Desktop.
Aligners
Usage: align dictfile.txt source translate
Russian tokenization has not been implemented yet, using generic scripts...
Read total: 4860962
Reading X stop list... done.
Reading seed translation lexicon... done.
Number of entries: 29638
Loading axis...done.
Number of sentences: 31
Loading axis...done.
Number of sentences: 25
Window size: 10
Aligning Sentences ... done.
~/projects/skuuper/skuuper-cat/cleaner/thirdparty/aligner_ch master*> cat output.align
1 <=> 1
2 <=> 2
3 <=> 3
4 <=> 4
5 <=> omitted
6,7 <=> 5,6
8,9 <=> 7
10 <=> 8,9
11 <=> omitted
12 <=> 10,11
13 <=> omitted
14 <=> omitted
15 <=> 12
16 <=> 13
17,18 <=> 14
19,20 <=> 15
21 <=> 16
22 <=> 17
23 <=> 18
24 <=> omitted
25 <=> omitted
26 <=> 19,20
27 <=> 21
28 <=> 22,23
29 <=> omitted
30 <=> 24
Reading dictionary...
86 sentences read in language 1.
88 sentences read in language 2.
quasiglobal_stopwordRemoval is set to 0
Simplified dictionary ready.
Rough translation ready.
0
Rough translation-based similarity matrix ready.
Matrix built.
Trail found.
Align ready.
Global quality of unfiltered align 0.168867
quasiglobal_spaceOutBySentenceLength is set to 1
Trail spaced out by sentence length.
Global quality of unfiltered align after realign 0.168867
0 0 0.120531
1 1 0.0825085
2 2 0.28
3 3 0.3
4 4 0.24
5 5 0.222581
6 6 0.290323
7 7 0.257143
8 8 0.295522
9 9 0.114826
10 10 0.168595
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment