Skip to content

Instantly share code, notes, and snippets.

@iurikura
Last active January 8, 2017 07:56
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save iurikura/e16ee92329a357810e56056eb266492e to your computer and use it in GitHub Desktop.
Save iurikura/e16ee92329a357810e56056eb266492e to your computer and use it in GitHub Desktop.
require 'nokogiri'
require 'anemone'
opts = {
depth_limit: 1
}
URL = "https://filmarks.com/users/hogehoge" # hogehoge に Username を入れてください
Anemone.crawl(URL, opts) do |anemone|
anemone.focus_crawl do |page|
page.links.keep_if { |link|
link.to_s.match(/hogehoge?page=d+/) # ここ、自信ないです
}
end
anemone.on_every_page do |page|
doc = Nokogiri::HTML.parse(page.body)
# 本当は title と score を分けて記述したかったが、うまくとりだせなかったため、”|”をつかって一度の繰り返しのなかで該当する要素を OR で抽出しています。
titlescores = doc.xpath('//html/body/div[3]/div[3]/div[1]/div/h3/a/text()|//html/body/div[3]/div[3]/div[1]/div/div/div[3]/a/span/text()')
titlescores.each do |titlescore|
p titlescore.text
end
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment