Skip to content

Instantly share code, notes, and snippets.

@msoutopico
Last active September 8, 2023 09:00
Show Gist options
  • Save msoutopico/08bd7332582ca1a065e555cf1f3e5776 to your computer and use it in GitHub Desktop.
Save msoutopico/08bd7332582ca1a065e555cf1f3e5776 to your computer and use it in GitHub Desktop.
Find common labels between adult and student QQ xml files
#!/usr/bin/env bash
ROOT="/home/souto/Sync/PISA25/Tech/QQ/QQ_overlap"
cd ${ROOT}
for f in $(find $ROOT -name "*.xml")
do
f_basename="$(basename -- $f .xml)"
output_fname=${f_basename}.txt
cat $f | grep -Poh "(?<=<text>).+?(?=</text>)" | sort > $output_fname
if [[ "$f_basename" == *STQ* ]] || [[ "$f_basename" == *ICQ* ]]; then
cat $output_fname >> QS_all.txt
else
cat $output_fname >> QA_all.txt
fi
done
# arch specific
perl-rename 's/PISA_2025FT_QQ_(.+)-.+.txt/$1.txt/g' *.txt
cat QS_all.txt | sort > QS_all_sorted.txt
cat QA_all.txt | sort > QA_all_sorted.txt
comm -12 QS_all_sorted.txt QA_all_sorted.txt | sort | uniq > overlap_s-vs-a_qq.txt
# cat 04_QQS_N/PISA_2025FT_QQ_ICQ-ICTQuestionnaire.xml | grep -Poh "(?<=<text>).+?(?=</text>)" > qqs_icq.txt
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment