Created
May 26, 2018 21:47
-
-
Save lucaswerkmeister/6704096ce7c73468610b02b1e2c8e895 to your computer and use it in GitHub Desktop.
commands to find Wikidata lemmas with a space
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
while IFS=$'\t' read -r lid language lemma; do | |
if [[ $lemma = *' '* ]]; then | |
printf '%s\t%s\t%s\n' "$lid" "$language" "$lemma" | |
fi | |
done |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
for i in {1..1500..50}; do | |
ids="L$i" | |
for ((j=1;j<50;j++)); do | |
ids+="|L$((i+j))" | |
done | |
curl \ | |
--silent \ | |
--data-urlencode action=wbgetentities \ | |
--data-urlencode format=json \ | |
--data-urlencode formatversion=2 \ | |
--data-urlencode ids="$ids" \ | |
--data-urlencode props=labels \ | |
https://www.wikidata.org/w/api.php | | |
jq -r ' | |
.entities | | |
.[] | | |
.id as $id | | |
.lemmas | | |
.[] | | |
( | |
$id + "\t" + | |
.language + "\t" + | |
.value | |
) | |
' | |
sleep 1 | |
done |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
To print all lemmas with a space and also save all lemmas (with or without space) to a file
lemmas
, save the above files, make them executable, and then run the following command in the same directory: