Skip to content

Instantly share code, notes, and snippets.

@sheepeeh
Created April 10, 2014 19:34
Show Gist options
  • Save sheepeeh/10415207 to your computer and use it in GitHub Desktop.
Save sheepeeh/10415207 to your computer and use it in GitHub Desktop.
For a given search URL, retrieve the Omeka item IDs for all results.
require 'mechanize'
require 'open-uri'
def get_ids(fname)
# Create Mechanize agent
a = Mechanize.new { |agent|
agent.follow_meta_refresh = true
}
# GET Search page
puts "Getting search page.."
# Replace SEARCH URL with the appropriate Omeka search URL
a.get('SEARCH URL') do |page|
# Open Output File
f = File.open(fname, "a")
puts "Getting links."
# Set up array
items = []
loop do
# Find desired links
page.links.each do |item|
hrf = item.attributes['href']
items << hrf if hrf.match("/items/show/") unless hrf.nil?
puts "Evaluating #{hrf}..."
end
# Get the next page and find desired links
next_page = page.link_with(:text => ">")
break unless next_page
puts "Getting next page.."
next_page = next_page.click
page = next_page
end
# Remove empty items and duplicate items, print values to file
puts "---------\nWriting to file.."
items = items.compact
items = items.uniq
items = items.join("\n")
items = items.gsub("/items/show/", "")
f.puts items
f.close
puts "---------\nDone."
end
end
# Change the text in quotes for a differently named file
get_ids("omeka_ids.txt")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment