Skip to content

Instantly share code, notes, and snippets.

@lucaswerkmeister
Created May 26, 2018 21:47
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save lucaswerkmeister/6704096ce7c73468610b02b1e2c8e895 to your computer and use it in GitHub Desktop.
Save lucaswerkmeister/6704096ce7c73468610b02b1e2c8e895 to your computer and use it in GitHub Desktop.
commands to find Wikidata lemmas with a space
#!/bin/bash
while IFS=$'\t' read -r lid language lemma; do
if [[ $lemma = *' '* ]]; then
printf '%s\t%s\t%s\n' "$lid" "$language" "$lemma"
fi
done
#!/bin/bash
for i in {1..1500..50}; do
ids="L$i"
for ((j=1;j<50;j++)); do
ids+="|L$((i+j))"
done
curl \
--silent \
--data-urlencode action=wbgetentities \
--data-urlencode format=json \
--data-urlencode formatversion=2 \
--data-urlencode ids="$ids" \
--data-urlencode props=labels \
https://www.wikidata.org/w/api.php |
jq -r '
.entities |
.[] |
.id as $id |
.lemmas |
.[] |
(
$id + "\t" +
.language + "\t" +
.value
)
'
sleep 1
done
@lucaswerkmeister
Copy link
Author

To print all lemmas with a space and also save all lemmas (with or without space) to a file lemmas, save the above files, make them executable, and then run the following command in the same directory:

2>/dev/null ./getLemmas | tee lemmas | ./filterLemmasWithSpace

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment