Skip to content

Instantly share code, notes, and snippets.

@jiru
Created July 9, 2020 09:46
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jiru/ce77fa7020276a0ad28757bab91997c2 to your computer and use it in GitHub Desktop.
Save jiru/ce77fa7020276a0ad28757bab91997c2 to your computer and use it in GitHub Desktop.
Consistency-check script for per-language exports
#!/bin/bash
if [ ! -d "$DL_DIR" ]; then
echo "Please run this script using the runner, like this:"
echo "./docs/cron/runner.sh $0"
exit 1
fi
DL_DIR=${DL_DIR%/}
check_per_language_consistency() {
local kind="$1"
diff -u \
<(tar xOf "$DL_DIR"/$kind.tar.bz2 $kind.csv | sort) \
<(bzcat "$DL_DIR"/per_language/*/*_$kind.tsv.bz2 | sort)
}
for kind in sentences_detailed links sentences tags \
sentences_in_lists sentences_with_audio \
user_languages sentences_CC0 \
transcriptions sentences_base; do
echo -n "Checking consistency of $kind... "
check_per_language_consistency $kind
if [ $? -ne 0 ]; then
echo "FAILED (see above)"
else
echo "OK"
fi
done
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment