Skip to content

Instantly share code, notes, and snippets.

@loretoparisi
Created September 21, 2018 12:47
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save loretoparisi/8019aa0b29f78e5f88bbf352cbb8574a to your computer and use it in GitHub Desktop.
Save loretoparisi/8019aa0b29f78e5f88bbf352cbb8574a to your computer and use it in GitHub Desktop.
Invert Indian to Roman CSV dataset using indictrans https://github.com/libindic/indic-trans/tree/master/indictrans
#!/bin/bash
SOURCE=$1
TARGET=eng
IN=$2
OUT=$3
while read -r col1 rest; do
printf '%s\t%s\n' "$col1" "$(indictrans -s $SOURCE -t $TARGET --ml --build-lookup <<<"$rest")"
done < $IN > $OUT
@loretoparisi
Copy link
Author

loretoparisi commented Sep 21, 2018

Example usage:

Please find an example dataset for Kannada (ISO 639-1 kn) here.

./invert_indic2roman.sh wiki.kn.txt wiki.kn.roman.txt

Example input data

KN	 ಐಕ್ಯತೆ ಕ್ಷೇಮಾಭಿವೃದ್ಧಿ ಸಂಸ್ಥೆ  ವಿಜಯಪುರ
KN	 ಹೊರಗಿನ ಸಂಪರ್ಕಗಳು 
KN	  ಮಕ್ಕಳ ಸಾಹಿತ್ಯ ಮತ್ತು ಸಾಂಸ್ಖ್ರುತಿಕ ಕ್ಷೇತ್ರದಲ್ಲಿ ಸೇವೆ ಸಲ್ಲಿಸುತ್ತಿರುವ ಸಂಸ್ಠೆ ಮಕ್ಕಳ ಲೋಕ 

Output data

KN	 aikyate kshemabhivruddhi sansthe  vijayapur
KN	 horgin samparkagalu 
KN	  makkal sahitya mattu sanskhruthik kshetradalli seve sallisuttiruv sansthe makkal lok 

Please see this SF question for more info about this task.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment