SQL DB query:
SELECT p.authority, p.identifier, f.contenttype, p.publicationdate
FROM datafile f, dvobject o, dataset s, dvobject p
WHERE f.id = o.id AND o.owner_id = s.id AND s.id = p.id AND s.harvestingclient_id IS NULL
Then filter and sort the result with:
sed 's/\-[0-9][0-9]\-[0-9][0-9] [^ ]*$//' | awk -F'\t' '{
if ($4) {print $1"/"$2, $4, $3}else{print $1"/"$2, "N/A", $3}}' | sort
Dataset:
10.7910/DVN/00234 2014 application/zip
10.7910/DVN/00234 2014 application/zip
10.7910/DVN/00234 2014 text/plain; charset=US-ASCII
10.7910/DVN/0049230 2014 application/pdf
10.7910/DVN/0049230 2014 application/vnd.ms-excel
10.7910/DVN/004HG6 N/A text/plain
10.7910/DVN/007GT 2015 application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
10.7910/DVN/007GT 2015 application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
Note: query does not look for the latest published version of the dataset. Non-published datasets have N/A as release year.