Skip to content

Instantly share code, notes, and snippets.

@Naouak
Last active August 29, 2015 14:06
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Naouak/f164f329507d53d99ea6 to your computer and use it in GitHub Desktop.
Save Naouak/f164f329507d53d99ea6 to your computer and use it in GitHub Desktop.
Random French Noun generator Bash based on words frequency in corpus
#!/bin/bash
# Liste des mots disponible ici : http://www.lexique.org/telLexique.php
WORD_COUNT=$1
if [ -z "$WORD_COUNT" ]; then
WORD_COUNT=1
fi
LC_NUMERIC="C"
# On calcule la fréquence totale des noms communs dans la langue
MAX=$(awk '{if($4 == "NOM") s+=$6*100;} END {print s}' liste_mots_top.txt)
WORDS=""
for i in $(seq 1 $WORD_COUNT); do
# un nombre au hasard entre 1 et le max de la fréquence
NUM=$(shuf -i 1-$MAX -n 1)
if [ -n "$WORDS" ]; then
WORDS=$WORDS" "
fi
# on regarde c'est quel mot
WORDS=$WORDS$(awk "{
if(\$4 == \"NOM\"){
prevs=s;
s+=\$6*100;
if(prevs < $NUM && s >= $NUM){
print \$1;
exit 0;
}
}
}" liste_mots_top.txt)
done;
echo $WORDS
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment