Skip to content

Instantly share code, notes, and snippets.

@tts
Last active December 28, 2015 00:39
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save tts/7414942 to your computer and use it in GitHub Desktop.
Save tts/7414942 to your computer and use it in GitHub Desktop.
Experimenting with the ORCID public data file: counting the number of different locales, and plotting the result
############################################################################
#
# Plotting the locale value from the ORCID Public Data File
# https://orcid.org/content/orcid-public-data-file
#
# Tuija Sonkkila 12.11.2013
#
############################################################################
library(ggplot2)
library(scales)
orcids <- read.table(file = "orcidlocalesAll.csv",
header = FALSE,
col.names = c("locale"),
colClasses = c("character"),
stringsAsFactors = FALSE)
orc <- within(orcids,
locale <- factor(locale,
levels = names(sort(table(locale),
decreasing = TRUE))))
png("orcidlocales.png", width = 1024, height = 768, res = 72)
ggplot(orc, aes(x = locale, fill = locale) ) +
geom_bar() +
scale_fill_brewer(palette = "Set2") +
scale_y_continuous(labels = comma,
breaks = c(1000, 10000, 100000, 200000, 300000))
dev.off()
# https://www.dropbox.com/s/hq3wfhayb1nd8wi/orcidlocales.png
<?xml version="1.0"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:o="http://www.orcid.org/ns/orcid"
version="2.0">
<! --
Pulling out the locale value from the ORCID Public Data File
https://orcid.org/content/orcid-public-data-file
Run with: java -jar ~/saxon9he.jar -it:main -xsl:orcidlocales.xsl -o:orcidlocales.csv
-->
<xsl:output method="text" indent="yes"/>
<xsl:template match="/" name="main">
<!--
In theory like this below, in one shot, but in practice I split the XML files to 8 directories (50000 files each),
run this XSL against each of them in return and finally, cat all output files together into orcidlocalesAll.csv
-->
<xsl:apply-templates select="collection('/l/xml?select=*.xml')/o:orcid-message/o:orcid-profile/o:orcid-preferences"/>
</xsl:template>
<xsl:template match="o:orcid-preferences">
<xsl:value-of select="o:locale"/><xsl:text>
</xsl:text>
</xsl:template>
</xsl:stylesheet>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment