Skip to content

Instantly share code, notes, and snippets.

@fmarani
Created December 13, 2011 23:38
Show Gist options
  • Save fmarani/1474483 to your computer and use it in GitHub Desktop.
Save fmarani/1474483 to your computer and use it in GitHub Desktop.
poor man's language classifier
echo "this is a text in engish written only to demonstrate the validity of this method in selecting the right language" > corpus_en
echo "questo è un testo in italiano scritto solamente per dimostrare la validita di questo metodo nel selezionare il linguaggio voluto" > corpus_it
echo "questa è una prova di testo per testare la versione italiana" > test
(echo `cat corpus_en test | gzip | wc -c` en; echo `cat corpus_it test | gzip | wc -c` it) | sort -n | head -1
echo "this is a test for the english version"> test
(echo `cat corpus_en test | gzip | wc -c` en; echo `cat corpus_it test | gzip | wc -c` it) | sort -n | head -1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment