Skip to content

Instantly share code, notes, and snippets.

@loretoparisi
Last active September 21, 2018 15:42
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save loretoparisi/2abb25c1db934bf7b77c6cc62cd857d7 to your computer and use it in GitHub Desktop.
Save loretoparisi/2abb25c1db934bf7b77c6cc62cd857d7 to your computer and use it in GitHub Desktop.
Invert Indic to Roman
#!/usr/bin/env bash
# ^^^^- not compatible with /bin/sh
IN=$1
OUT=$2
paste <(<$IN awk '{print $2}') \
<(<$IN sed -E 's/^[^[:space:]]*[[:space:]]//' \
| indictrans -s asm -t eng --ml --build-lookup) \
> $OUT
@loretoparisi
Copy link
Author

loretoparisi commented Sep 21, 2018

Example usage:

./convert_roman_all.sh wiki.kn.txt wiki.kn.roman.txt

Example input data

KN	 ಐಕ್ಯತೆ ಕ್ಷೇಮಾಭಿವೃದ್ಧಿ ಸಂಸ್ಥೆ  ವಿಜಯಪುರ
KN	 ಹೊರಗಿನ ಸಂಪರ್ಕಗಳು 
KN	  ಮಕ್ಕಳ ಸಾಹಿತ್ಯ ಮತ್ತು ಸಾಂಸ್ಖ್ರುತಿಕ ಕ್ಷೇತ್ರದಲ್ಲಿ ಸೇವೆ ಸಲ್ಲಿಸುತ್ತಿರುವ ಸಂಸ್ಠೆ ಮಕ್ಕಳ ಲೋಕ 

Expected Output data

KN	 aikyate kshemabhivruddhi sansthe  vijayapur
KN	 horgin samparkagalu 
KN	  makkal sahitya mattu sanskhruthik kshetradalli seve sallisuttiruv sansthe makkal lok 

Actual output data:

ಐಕ್ಯತೆ	 ಐಕ್ಯತೆ ಕ್ಷೇಮಾಭಿವೃದ್ಧಿ ಸಂಸ್ಥೆ  ವಿಜಯಪುರ
ಹೊರಗಿನ	 ಹೊರಗಿನ ಸಂಪರ್ಕಗಳು 
ಮಕ್ಕಳ	  ಮಕ್ಕಳ ಸಾಹಿತ್ಯ ಮತ್ತು ಸಾಂಸ್ಖ್ರುತಿಕ ಕ್ಷೇತ್ರದಲ್ಲಿ ಸೇವೆ ಸಲ್ಲಿಸುತ್ತಿರುವ ಸಂಸ್ಠೆ ಮಕ್ಕಳ ಲೋಕ  

Please see this SF question for more info about this task.

@charles-dyfis-net
Copy link

charles-dyfis-net commented Sep 21, 2018

The code in your gist has awk '{print $2}', but it needs to be (and was, in my original answer) awk '{print $1}'.

So that explains the first column. The rest is coming from indictrans -- that's not an easy piece of software to build (its setup.py is missing a bunch of dependencies, and it also is requiring a libAccelerate that isn't included or automatically downloaded), so I'm not in a position to help you with its issues.

BTW, all-caps names are used for variables meaningful to the shell or other POSIX-defined tools, whereas lowercase names are guaranteed not to conflict when used by applications; see http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment