Skip to content

Instantly share code, notes, and snippets.

@atrisovic
Last active May 3, 2021 21:11
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save atrisovic/475a790ae7ab0a8ff8482735e16ee898 to your computer and use it in GitHub Desktop.
Save atrisovic/475a790ae7ab0a8ff8482735e16ee898 to your computer and use it in GitHub Desktop.

Dataset stats from Dataverse

in the format DOI, release_year, mime_type

SQL DB query:

SELECT p.authority, p.identifier, f.contenttype, p.publicationdate 
FROM datafile f, dvobject o, dataset s, dvobject p 
WHERE f.id = o.id AND o.owner_id = s.id AND s.id = p.id AND s.harvestingclient_id IS NULL

Then filter and sort the result with:

sed 's/\-[0-9][0-9]\-[0-9][0-9] [^ ]*$//' | awk -F'\t' '{
if ($4) {print $1"/"$2, $4, $3}else{print $1"/"$2, "N/A", $3}}' | sort 

Dataset:

10.7910/DVN/00234 2014 application/zip
10.7910/DVN/00234 2014 application/zip
10.7910/DVN/00234 2014 text/plain; charset=US-ASCII
10.7910/DVN/0049230 2014 application/pdf
10.7910/DVN/0049230 2014 application/vnd.ms-excel
10.7910/DVN/004HG6 N/A text/plain
10.7910/DVN/007GT 2015 application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
10.7910/DVN/007GT 2015 application/vnd.openxmlformats-officedocument.spreadsheetml.sheet

Note: query does not look for the latest published version of the dataset. Non-published datasets have N/A as release year.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment