Skip to content

Instantly share code, notes, and snippets.

@dshorthouse
Created August 5, 2020 16:22
Show Gist options
  • Save dshorthouse/63ba8525a5c27a581cddf5d6e2bdaf1e to your computer and use it in GitHub Desktop.
Save dshorthouse/63ba8525a5c27a581cddf5d6e2bdaf1e to your computer and use it in GitHub Desktop.
ZooKeys ORCID Scrape
#!/usr/bin/env ruby
# encoding: utf-8
require 'rest_client'
require 'csv'
require 'nokogiri'
require 'colorize'
page_range = 0..50
def get_doc_urls(url, xpath)
html = RestClient.get(url)
doc = Nokogiri::HTML.parse(html)
doc.xpath(xpath).map{|a| a.attributes["href"].value}
end
CSV.open("orcids.csv", "w") do |csv|
page_range.each do |i|
orcid_urls = get_doc_urls("https://zookeys.pensoft.net/browse_journal_articles.php?journal_name=zookeys&journal_id=2&p=#{i}", "//*[@class=\"inline-orcid\"]")
orcids = orcid_urls.map{|o| o.sub!("https://orcid.org/", "")}
orcids.each do |orcid|
if orcid
csv << [orcid]
puts orcid.green
end
end
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment