Skip to content

Instantly share code, notes, and snippets.

@slint
Last active December 27, 2023 13:29
Show Gist options
  • Save slint/eb4bcb8bc572a37b9650b8c55e759fc9 to your computer and use it in GitHub Desktop.
Save slint/eb4bcb8bc572a37b9650b8c55e759fc9 to your computer and use it in GitHub Desktop.
Fetch Zenodo stats for communities and queries
#!/bin/bash
# To use this script you need to have "curl" and "jq" installed.
COMMUNITY_ID="community_id"
OUTPUT_CSV="${COMMUNITY_ID}_community_stats.csv"
# Create CSV file header
echo "URL,DOI,Title,PublicationDate,Views,Downloads" > "${OUTPUT_CSV}"
# Download all records (including multiple versions) from the community (max 10k records)
curl -s -G "https://zenodo.org/api/records/" -d "size=10000" -d "all_versions=true" -d "communities=${COMMUNITY_ID}" \
`# Process with jq to extract the required fields` \
| jq -r '.hits.hits[] | [.links.self, .metadata.doi, .metadata.title, .metadata.publication_date, .stats.views, .stats.downloads] | @csv' \
>> "${OUTPUT_CSV}"
#!/bin/bash
# To use this script you need to have "curl" and "jq" installed.
QUERY="grants.code:12345"
OUTPUT_CSV="[${QUERY}]_query_stats.csv"
# Create CSV file header
echo "URL,DOI,Title,PublicationDate,Views,Downloads" > "${OUTPUT_CSV}"
# Download all records (including multiple versions) for the query (max 10k records)
curl -s -G "https://zenodo.org/api/records/" -d "size=10000" -d "all_versions=true" -d "q=${QUERY}" \
`# Process with jq to extract the required fields` \
| jq -r '.hits.hits[] | [.links.self, .metadata.doi, .metadata.title, .metadata.publication_date, .stats.views, .stats.downloads] | @csv' \
>> "${OUTPUT_CSV}"
#!/bin/bash
# To use this script you need to have "curl" and "jq" installed.
USER_ID="12345"
OUTPUT_CSV="${USER_ID}_user_stats.csv"
# Create CSV file header
echo "URL,DOI,Title,PublicationDate,Views,Downloads" > "${OUTPUT_CSV}"
# Download all records (including multiple versions) for the user (max 10k records)
curl -s -G "https://zenodo.org/api/records/" -d "size=10000" -d "all_versions=true" -d "q=owners:${USER_ID}" \
`# Process with jq to extract the required fields` \
| jq -r '.hits.hits[] | [.links.self, .metadata.doi, .metadata.title, .metadata.publication_date, .stats.views, .stats.downloads] | @csv' \
>> "${OUTPUT_CSV}"
@kathrin77
Copy link

I manage to parse some communities that way (most of the smaller ones, anyway), but not all. E.g. like this :
curl -s -G "https://zenodo.org/api/records?communities=${COMMUNITY_ID}" -d "size=10000" -d "all_versions=true" | jq -r '.hits.hits[] | [.metadata.doi, .metadata.title, .stats.unique_views, .stats.unique_downloads, .metadata.communities[][]] |@csv' >> "${OUTPUT_CSV}"

works with community 'lory_unilu_tf' but not 'lory_unilu'. The first one is a smaller community, the second a bigger one. So far the biggest community that I managed was about 500 records, but fails at 600 records. I've tried leaving off the 'metadata.title' in the hope that some wonky character is generating the error, but that has not helped.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment