Skip to content

Instantly share code, notes, and snippets.

@pdlug
Created February 4, 2015 04:59
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save pdlug/ec2c6eef8fc51f6b6e40 to your computer and use it in GitHub Desktop.
Save pdlug/ec2c6eef8fc51f6b6e40 to your computer and use it in GitHub Desktop.
Demo of parsing JATS XML using `clojure.data.zip.xml` to extract contributors.
(ns jats.contribs
(:require [clojure.java.io :as io]
[clojure.zip :as zip]
[clojure.data.xml :as xml]
[clojure.data.zip.xml :as zx]
[clojure.pprint :as pprint])
(:gen-class :main true))
(defn contrib-name [name]
(let [first-name (zx/xml1-> name :given-names zx/text)
surname (zx/xml1-> name :surname zx/text)]
{:first-name first-name
:surname surname}))
(defn contrib [contrib]
{:name (zx/xml1-> contrib :name contrib-name)
:type (zx/xml1-> contrib (zx/attr :contrib-type))
:role (zx/xml1-> contrib :role zx/text)
:orcid (zx/xml1-> contrib :contrib-id (zx/attr= :contrib-id-type "orcid") zx/text)})
(defn contributors [data]
(zx/xml-> data :front :article-meta :contrib-group :contrib contrib))
(defn parse [filename]
(with-open [input-xml (java.io.FileInputStream. filename)]
(let [doc (-> input-xml xml/parse zip/xml-zip)]
(pprint/pprint (contributors doc)))))
(defn -main
[& args]
(parse (first args)))
Output of parsing a file from the eLife JATS XML repository:
% lein run -m jats.contribs ~/tmp/elife-articles/elife04193.xml
({:name {:first-name "Haidun", :surname "Yan"},
:type "author",
:role nil,
:orcid nil}
{:name {:first-name "Juan L", :surname "Pablo"},
:type "author",
:role nil,
:orcid nil}
{:name {:first-name "Chaojian", :surname "Wang"},
:type "author",
:role nil,
:orcid nil}
{:name {:first-name "Geoffrey S", :surname "Pitt"},
:type "author",
:role nil,
:orcid "http://orcid.org/0000-0003-2246-0289"}
{:name {:first-name "Gary L", :surname "Westbrook"},
:type "editor",
:role "Reviewing editor",
:orcid nil})
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment