Skip to content

Instantly share code, notes, and snippets.

@zonuexe
Forked from atpons/sangi.rb
Created September 26, 2012 11:59
Show Gist options
  • Save zonuexe/3787625 to your computer and use it in GitHub Desktop.
Save zonuexe/3787625 to your computer and use it in GitHub Desktop.
TMCIT Exam archives downaloder
#!/usr/bin/env ruby
# -*- coding: utf-8 -*-
# ++ TMCIT Exam Archives Downloader ++
# ++ I don't like regular expression. ++
# !! At your own risk !!
# License
# Copyright (c) 2012 atpons
#
# Permission is hereby granted, free of charge, to any person obtaining a copy of
# this software and associated documentation files (the "Software"), to deal in
# the Software without restriction, including without limitation the rights to use,
# copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
# the Software, and to permit persons to whom the Software is furnished to do so,
# subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
# THE SOFTWARE.
require "mechanize"
require "uri"
module Atpons
class Downloader
def initialize(*args)
@agent = Mechanize.new
@site = args[0]
@index_uri = args[1] || args[0]
@index = @agent.get(@index_uri)
@dlfiles = []
end
def get_links; nil; end
def download!(*args)
save_to = args[0]
#proc = Proc.new
@dlfiles.each do |f|
p uri = URI.join(@site, f)
aget = if save_to
@agent.get uri, save_to
else
@agent.get uri
end
aget.save
end
end
end
class TMCITDownloader < Downloader
SITE_URI = "http://www.metro-cit.ac.jp/examination/"
SITE_INDEX = "http://www.metro-cit.ac.jp/examination/honka_haihu.html"
YEAR_RANGE = 2008..2012
PDF_TYPE = %w(shoronbun mondai kaito)
# 酢よ、これが性器表現だ
year = YEAR_RANGE.to_a.join('|')
subject = "[a-z]+"
type = PDF_TYPE.join("|")
RE_PARSE_FILENAME = /^(#{year})(?:_(#{subject}))?_(#{type})\.pdf$/
ParsedFilename = Struct.new("ParsedFilename", :year, :subject, :type)
def initialize
@files = []
super SITE_URI, SITE_INDEX
end
def get_links
@agent.page.links_with(:href => /pdf\Z/).each do |link|
target_uri = URI.join(SITE_URI, link.uri.to_s)
#aget = @agent.get(target_uri)
b = link.uri.to_s
fetch = target_uri
bb = b.gsub("pdf/","")
@dlfiles << b
#aget.save
end
end
def download
unless @dlfiles.size != 0
nil
end
end
def download!
mkdir_p
get_links
super
@dlfiles
end
def TMCITDownloader.download
@instance = new
@instance.download!
end
#正規表現要らない
def mkdir_p
puts "[*]Making directory..."
YEAR_RANGE.each do |year|
PDF_TYPE.each do |type|
FileUtils.mkdir_p "#{year}/#{type}"
end
end
end
def parse_filename(str)
str =~ RE_PARSE_FILENAME
year = $1
subject = $2
type = $3
ParsedFilename.new(year, subject, type)
end
end
end
if __FILE__ == $0
include Atpons
TMCITDownloader.download
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment