Skip to content

Instantly share code, notes, and snippets.

@bew
Last active June 7, 2017 23:29
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save bew/05195b7421c3b6f5de680ee5a60068e9 to your computer and use it in GitHub Desktop.
Save bew/05195b7421c3b6f5de680ee5a60068e9 to your computer and use it in GitHub Desktop.
# Well, here goes nothing.
require "Crystagiri"
puts "Program should be starting up now."
if File.exists?("./debug.txt")
debug_navigator = File.open("debug.txt")
debug_lines = debug_navigator.read_lines
# CODEHERE Should go through debug_lines array, and get the last int line and the last URL line
# I can convert the lines to ints, I think, with a to_int method built in maybe? and the URL are fine as string.
last_line = debug_lines.last
# You need to close the file when you no longer use it
debug_navigator.close
else
latest_link = "http://narbonic.com/comic/july-31-august-5-2000/"
end
while latest_link != "http://narbonic.com/comic/endpaper/" # So, until the loop has gone thru every page but the last.
dis_doc = Crystagiri::HTML.from_url(latest_link) # Creates obj thru internet, is the page.
dir_name = dis_doc.where_tag("title") # I WANT to set var dir_name as the string contained in the title HTML tag, but I don't know how.
create_and_dir(dir_name) # This method SHOULD pop us into a dir it creates with the name we got from the title of the page.
# CODEHERE I should somehow be able to use my Crystagiri obj to parse through and grab the images and commentary from their respective tags, and then use
# File or FileUtils to save aforementioned things. Don't know how tho.
# Then, once that's done, I should be able to call the mark_and_retreat method to grab our next page and exit the current dir to the master dir.
latest_link = mark_and_retreat
end
def create_and_dir(dir_name)
FileUtils.mkdir_p(dir_name)
Dir.cd(dir_name)
end
def mark_and_retreat
# exit the current directory, so that the next time we run create_and_dir it doesn't make a new child in a non-master dir.
Dir.cd("..")
# HEre I need the method to retun the URL under "next" which is the url fo the next page in chronological order on the site.
return doc.where_tag("next")
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment