Created
December 27, 2010 20:42
-
-
Save semperos/756540 to your computer and use it in GitHub Desktop.
JRuby script that logs you into Github and scrapes the "source" of your Wiki pages (i.e. what you typed) and saves it to a file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
require 'rubygems' | |
require 'celerity' | |
require 'hpricot' | |
require 'htmlentities' | |
# You obviously need all of the above gems installed before proceeding | |
user, password, project = ARGV # 'tobi', 'my_password', 'liquid' | |
raise(ArgumentError, "jruby scrape-github-wiki <username> <password> <projectname>") unless user and project and password | |
# Constants | |
WIKI_PAGES_URL = "https://github.com/#{user}/#{project}/wiki/_pages" | |
BASE_URL = "https://github.com" | |
# Start browser | |
puts "Starting headless browser and scraping wiki pages..." | |
@b = Celerity::Browser.new | |
@b.goto "https://github.com/login" | |
# For decoding | |
@ent = HTMLEntities.new | |
begin | |
@b.text_field(:name => 'login').set user | |
@b.text_field(:name => 'password').set password | |
@b.button(:value => "Log in").click | |
@b.goto WIKI_PAGES_URL | |
toc_links = @b.elements_by_xpath("//*[@id='guides']/div/div[contains(@class, 'wikistyle')]/ul/li/strong/a") | |
wiki_links = [] | |
# We have to get the href's up front, because celerity | |
# won't find the elements in its cache once we navigate away | |
toc_links.each do |l| | |
wiki_links << BASE_URL + l.href | |
end | |
wiki_text = '' | |
wiki_links.each do |l| | |
@b.goto l | |
@b.link(:class => /btn-edit/).click | |
wiki_page_title = @b.text_field(:id => "wiki_name").text | |
puts "Scraping wiki page with title: #{wiki_page_title}" | |
wiki_text << "Wiki Page Title: #{wiki_page_title}\n" | |
wiki_text << @ent.decode(Hpricot(@b.html).at("#wiki_body").inner_html) | |
wiki_text << "\n\n" + ("#" * 80) + "\n\n" | |
end | |
rescue StandardError => e | |
puts "An error occurred: " + e | |
ensure | |
@b.close | |
end | |
puts "Saving wiki page source to 'github_wiki_pages.txt'..." | |
File.open('github_wiki_pages.txt', 'w') { |f| f.write(wiki_text)} | |
puts "\nDone\n" |
check out my fork
Absolutely true :) There I go forgetting that they're repo's. Though it represents a fine "hello world" of using the Watir/Celerity API and some super-simple Hpricot.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
You can also clone your wiki like a normal github repo, it's a lot simpler. :)