Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Spearman correlation: Wikidata QRank and Wikidata PageRank (danker)
#!/usr/bin/env bash
export LC_ALL=C
if [ ! -f "qrank_sorted.tsv" ]; then
wget -O - https://qrank.toolforge.org/download/qrank.csv.gz | \
gunzip -c | \
tail -n+2 | \
sed "s/,/\t/" | \
sort -k1,1 \
> qrank_sorted.tsv
fi
if [ ! -f "pr_202111_sorted.tsv" ]; then
wget -O - https://danker.s3.amazonaws.com/2021-11-15.allwiki.links.rank.bz2 | \
bunzip2 -c | \
sort -k1,1 \
> pr_202111_sorted.tsv
fi
join qrank_sorted.tsv pr_202111_sorted.tsv > qrank_pr_joined.tsv
wc -l qrank_sorted.tsv pr_202111_sorted.tsv qrank_pr_joined.tsv
Rscript <(printf "qpr <- read.table(file = 'qrank_pr_joined.tsv', sep = ' ')\ncor(qpr[2],qpr[3], method='spearman')")
@madrisan
Copy link

https://qrank.toolforge.org/download/qrank.csv.gz is redirected to https://qrank.wmcloud.org/download/qrank.csv.gz which is broken since at least one week (error 502).
Do you know any other URL providing the same data?

@athalhammer
Copy link
Author

Good question, thanks @madrisan. Maybe @brawer can help here.

@brawer
Copy link

brawer commented Aug 29, 2022

Sorry about this! See brawer/wikidata-qrank#8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment