Skip to content

Instantly share code, notes, and snippets.

@slint
Last active December 27, 2023 13:29
Show Gist options
  • Save slint/eb4bcb8bc572a37b9650b8c55e759fc9 to your computer and use it in GitHub Desktop.
Save slint/eb4bcb8bc572a37b9650b8c55e759fc9 to your computer and use it in GitHub Desktop.
Fetch Zenodo stats for communities and queries
#!/bin/bash
# To use this script you need to have "curl" and "jq" installed.
COMMUNITY_ID="community_id"
OUTPUT_CSV="${COMMUNITY_ID}_community_stats.csv"
# Create CSV file header
echo "URL,DOI,Title,PublicationDate,Views,Downloads" > "${OUTPUT_CSV}"
# Download all records (including multiple versions) from the community (max 10k records)
curl -s -G "https://zenodo.org/api/records/" -d "size=10000" -d "all_versions=true" -d "communities=${COMMUNITY_ID}" \
`# Process with jq to extract the required fields` \
| jq -r '.hits.hits[] | [.links.self, .metadata.doi, .metadata.title, .metadata.publication_date, .stats.views, .stats.downloads] | @csv' \
>> "${OUTPUT_CSV}"
#!/bin/bash
# To use this script you need to have "curl" and "jq" installed.
QUERY="grants.code:12345"
OUTPUT_CSV="[${QUERY}]_query_stats.csv"
# Create CSV file header
echo "URL,DOI,Title,PublicationDate,Views,Downloads" > "${OUTPUT_CSV}"
# Download all records (including multiple versions) for the query (max 10k records)
curl -s -G "https://zenodo.org/api/records/" -d "size=10000" -d "all_versions=true" -d "q=${QUERY}" \
`# Process with jq to extract the required fields` \
| jq -r '.hits.hits[] | [.links.self, .metadata.doi, .metadata.title, .metadata.publication_date, .stats.views, .stats.downloads] | @csv' \
>> "${OUTPUT_CSV}"
#!/bin/bash
# To use this script you need to have "curl" and "jq" installed.
USER_ID="12345"
OUTPUT_CSV="${USER_ID}_user_stats.csv"
# Create CSV file header
echo "URL,DOI,Title,PublicationDate,Views,Downloads" > "${OUTPUT_CSV}"
# Download all records (including multiple versions) for the user (max 10k records)
curl -s -G "https://zenodo.org/api/records/" -d "size=10000" -d "all_versions=true" -d "q=owners:${USER_ID}" \
`# Process with jq to extract the required fields` \
| jq -r '.hits.hits[] | [.links.self, .metadata.doi, .metadata.title, .metadata.publication_date, .stats.views, .stats.downloads] | @csv' \
>> "${OUTPUT_CSV}"
@phoenix-mossimo
Copy link

Hi, are the scripts still valid? I get "parse error: Invalid numeric literal at line 1, column 10"

@kathrin77
Copy link

Hi, I am getting the same parse error. There seem to be two different issues:

  1. remove the final '/' in the zenodo-url in the curl command. For instance, it works with:
    curl -s -G "https://zenodo.org/api/records" -d "size=100" -d "all_versions=true" -d "communities=xxx"
    (but not with "https://zenodo.org/api/records/")

  2. It seems that the zenodo api no longer allows to harvest thousands of records in one go (504 Gateway Time-out). I tried with 4000 records => not working, with 100 records => working. The same happens in the browser, or with openrefine (which is how I used to do my end of year statistics last year).

Thanks for a hint on how to get all records (stats) for one specific community. It seems that the Zenodo API has changed after the update in October, because it worked until then.
Best regards, Kathrin

@phoenix-mossimo
Copy link

Hi, removing the front slash (/) and reducing the size does not help in my case - the original error is gone, but the wrong documents are fetched. The only workaround so far is to pack all options in the url: curl -G "https://zenodo.org/api/records?communities=operaseu&all_versions=true&size=10000" \

@kathrin77
Copy link

I manage to parse some communities that way (most of the smaller ones, anyway), but not all. E.g. like this :
curl -s -G "https://zenodo.org/api/records?communities=${COMMUNITY_ID}" -d "size=10000" -d "all_versions=true" | jq -r '.hits.hits[] | [.metadata.doi, .metadata.title, .stats.unique_views, .stats.unique_downloads, .metadata.communities[][]] |@csv' >> "${OUTPUT_CSV}"

works with community 'lory_unilu_tf' but not 'lory_unilu'. The first one is a smaller community, the second a bigger one. So far the biggest community that I managed was about 500 records, but fails at 600 records. I've tried leaving off the 'metadata.title' in the hope that some wonky character is generating the error, but that has not helped.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment