Skip to content

Instantly share code, notes, and snippets.

@acka47
Last active May 16, 2023 07:52
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save acka47/22327d357343f8dfffeb444501846358 to your computer and use it in GitHub Desktop.
Save acka47/22327d357343f8dfffeb444501846358 to your computer and use it in GitHub Desktop.
A count of the different formats recorded in BARTOC

Generated with the following steps (thanks Jakob, https://twitter.com/nichtich/status/1410526898694807552):

  1. $ curl https://bartoc.org/data/dumps/latest.ndjson | jq -r .FORMAT[].uri > bartoc-formats.txt
  2. $ cat bartoc-formats.txt | sort | uniq -c | sort -nr
  3. Remove http://bartoc.org/en/Format/ (for better readability)
  4. Make it a Markdown table
Count Format
1935 Online
898 PDF
846 RDF
511 Printed
499 SKOS
473 XML
282 Spreadsheet
182 HTML
161 CSV
138 JSON
111 MADS
105 OWL
89 Zthes
84 XTM
82 DC
82 BS8723-5
74 JSON-LD
69 VDEX
53 Microform
51 Word
41 TXT
35 CD-ROM
32 XSD
29 OBO
26 MARC
23 Floppy-Disc
17 Database
10 EPUB
9 Geodata
4 JSKOS
3 ClaML
@sroertgen
Copy link

This is not working anymore.
Some entries do not contain the FORMAT key. This breaks the query.

I had to change to

curl https://bartoc.org/data/dumps/latest.ndjson | jq '(.FORMAT[].uri)?' > bartoc-formats.txt

To see what lines do not contain the FORMAT key you can use: curl https://bartoc.org/data/dumps/latest.ndjson | jq -n 'inputs | select(has("FORMAT") | not) | input_line_number'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment