Skip to content

Instantly share code, notes, and snippets.

@Tasemu
Last active December 25, 2015 18:28
Show Gist options
  • Save Tasemu/7020324 to your computer and use it in GitHub Desktop.
Save Tasemu/7020324 to your computer and use it in GitHub Desktop.
HTML Scraper Practise
task :scrape => :environment do
require 'nokogiri'
require 'open-uri'
text = []
invalid = []
i = 0
File.read("text.txt").each_line do |line|
text << line.chop
end
text.each_with_index do |series, index|
url = "http://www.anime-planet.com/anime/" + series
begin
doc = open(url)
rescue OpenURI::HTTPError => http_error
# bad status code returned
next
invalid.push(url)
# http_error.message is the numeric code and text in a string
end
data = Nokogiri::HTML(doc)
if (!data.text.include? "Synopsis:")
invalid.push(url)
next
else
title = data.at_css('.theme').text
synopsis = data.at_css('.synopsis').text.strip
synopsis.slice! "Synopsis:\r\n\t\t\t\t\t"
eps = data.at_css('.type').text
year = data.at_css('.year').text
rating = data.at_css('.avgRating').text
categories = data.at_css('.categories')
genre = categories.css('li').text
image = data.at_css('#screenshots img')
imagePath = "http://www.anime-planet.com" + image['src']
anime = Series.create({:title => title, :image => imagePath, :description => synopsis, :eps => eps, :year => year, :rating => rating})
anime.tag_list = genre
anime.save()
puts anime.inspect
puts "Completed: #{index} of #{text.length}"
end
end
puts "-----Invalid URL's-----"
puts ""
invalid.each do |bad|
puts bad
end
http_error.message.each do |msg|
puts msg
end
end
a-channel
a-little-princess-sara
a-little-snow-fairy-sugar
abenobashi-mahou-shotengai
accel-world
acchi-kocchi
afro-samurai
ai-yori-aoshi
aikatsu
air
air-gear
air-master
aishiteru-ze-baby
aiura
akagi
akane-maniax
akane-iro-ni-somaru-saka
akane-iro-ni-somaru-saka-hardcore
akb0048
akb0048-next-stage
akikan
aku-no-hana
alice-academy
amaenaideyo
amaenaideyo-katsu
amagami-ss
amatsuki
amazing-nurse-nanako
amnesia
angel-beats
angel-heart
angel-sanctuary
angelic-layer
anime-mirai
ano-hi-mita-hana-no-namae-wo-bokutachi-wa-mada-shiranai
ano-natsu-de-matteru
another
antique-bakery
aoi-hana
aoi-sekai-no-chuushin-de
aquarian-age
aquarion-evol
arakawa-under-the-bridge
arakawa-under-the-bridge-x-bridge
arashi-no-yoru-ni
arata-kangatari
arata-naru-sekai
arc-the-lad
arcana-famiglia
area-88
area-no-kishi
argento-soma
aria-the-animation
arjuna
armor-hunter-mellowlink
armored-trooper-votoms
around-the-world-in-eighty-days
asa-made-jugyou-chu
asagiri-no-miko
asatte-no-houkou
ashita-no-nadja
asobi-ni-iku-yo
asobi-ni-ikuyo
astarotte-no-omocha
astarotte-no-omocha-ex
asu-no-yoichi
asura-cryin
asura-cryin-season-2
avenger
ayakashi
ayashi-no-ceres
ayu-mayu-gekijou
azumanga-daioh
azumi
b-gata-h-kei
baby-princess-3d-paradise
baccano
baka-to-test-to-shoukanjuu
baka-to-test-to-shoukanjuu-ni
baka-to-test-to-shoukanjuu-matsuri
bakemonogatari
baki-the-grappler
bakuman
bakumatsu-gijinden-roman
bakumatsu-kikansetsu-irohanihoheto
bamboo-blade
banner-of-the-stars
bartender
basilisk
basquash
battle-girls
battle-programmer-shirase
beast-saga
beck
beelzebub
beelzebub-ova
ben-to
ben-to-specials
berserk
betterman
beyblade
beyblade
binbou-shimai-monogatari
binbougami-ga
birdy-the-mighty
birdy-the-mighty-decode
black-blood-brothers
black-cat
black-lagoon
black-rock-shooter
blade
blade-of-the-immortal
blassreiter
bleach
blood-plus
blood-c
bludgeoning-angel-dokuro-chan
blue-dragon
blue-drop
blue-gender
bobobo-bo-bo-bobo
boku-no-imouto-wa-osaka-okan
boku-wa-tomodachi-ga-sukunai
bokura-ga-ita
bokurano
boogiepop-phantom
bottle-fairy
boukyaku-no-senritsu
bounen-no-xamdou
brave-10
btooom
bubblegum-crisis
bungaku-shoujo-memoire
burst-angel
bus-gamer
busou-chuugakusei
busou-renkin
busou-shinki
busou-shinki-moon-angel
buzzer-beater
c-the-money-of-soul-and-possibility-control
campione
canaan
candy-boy
canvas-2
capeta
captain-tsubasa
cardcaptor-sakura
cardfight-vanguard
carnival-phantasm
casshern-sins
cat-shit-one
chaos-head
chibi-devi
chihayafuru
chitose-get-you
chobits
chocotan
chokotto-sister
choujuu-kishin-dancougar
chousoku-henkei-gyrozetter
chouyaku-hyakuninisshu
chrome-shelled-regios
chrono-crusade
chu-bra
chuunibyou-demo-koi-ga-shitai
chuunibyou-demo-koi-ga-shitai-lite
clamp-gakuen-tanteidan
clannad
clannad-after-story
claymore
cluster-edge
cobra-the-animation
code-geass
code-geass-r2
code-e
code-breaker
comic-party
computer-kakumei
cooking-master-boy
copihan
corpse-party
cowboy-bebop
crest-of-the-stars
cromartie-high-school
cross-fight-b-daman
cross-game
crying-freeman
crystal-blaze
cuticle-tantei-inaba
cyborg-009
cyclops-shoujo-saipuu
192-168-1-2:animedatabase montaguemonro$ rake scrape --trace
** Invoke scrape (first_time)
** Invoke environment (first_time)
** Execute environment
** Execute scrape
#<Series id: 109, title: "A-Channel", image: "http://www.anime-planet.com/images/anime/main_image...", description: "Having been close friends for years, when young Too...", eps: "TV (12 eps)", year: "2011", rating: "3.194 out of 5 from 3,218 votes", created_at: "2013-10-17 07:09:27", updated_at: "2013-10-17 07:09:27">
Completed: 0 of 193
#<Series id: 110, title: "A Little Princess Sara", image: "http://www.anime-planet.com/images/anime/main_image...", description: "When Sara Crewe's father left to work in India, she...", eps: "TV (46 eps)", year: "1985", rating: "3.09 out of 5 from 613 votes", created_at: "2013-10-17 07:09:29", updated_at: "2013-10-17 07:09:29">
Completed: 1 of 193
#<Series id: 111, title: "A Little Snow Fairy Sugar", image: "http://www.anime-planet.com/images/anime/main_image...", description: "Saga is an ordinary girl who lives her life by plan...", eps: "TV (24 eps)", year: "2001 - 2002", rating: "2.729 out of 5 from 1,792 votes", created_at: "2013-10-17 07:09:30", updated_at: "2013-10-17 07:09:30">
Completed: 2 of 193
#<Series id: 112, title: "Accel World", image: "http://www.anime-planet.com/images/anime/main_image...", description: "In a world where everyone connects to an online net...", eps: "TV (24 eps)", year: "2012", rating: "4.094 out of 5 from 8,833 votes", created_at: "2013-10-17 07:09:34", updated_at: "2013-10-17 07:09:34">
Completed: 4 of 193
#<Series id: 113, title: "Acchi Kocchi", image: "http://www.anime-planet.com/images/anime/main_image...", description: "No synopsis yet - check back soon!", eps: "TV (12 eps)", year: "2012", rating: "3.963 out of 5 from 3,696 votes", created_at: "2013-10-17 07:09:35", updated_at: "2013-10-17 07:09:35">
Completed: 5 of 193
#<Series id: 114, title: "Afro Samurai", image: "http://www.anime-planet.com/images/anime/main_image...", description: "In a futuristic and wild west-inspired Japan, there...", eps: "TV (5 eps)", year: "2007", rating: "3.431 out of 5 from 16,013 votes", created_at: "2013-10-17 07:09:38", updated_at: "2013-10-17 07:09:38">
Completed: 6 of 193
#<Series id: 115, title: "Ai Yori Aoshi", image: "http://www.anime-planet.com/images/anime/main_image...", description: "Aoi Sakuraba, heir to the Sakura department store, ...", eps: "TV (24 eps)", year: "2002", rating: "3.502 out of 5 from 9,821 votes", created_at: "2013-10-17 07:09:40", updated_at: "2013-10-17 07:09:40">
Completed: 7 of 193
#<Series id: 116, title: "Aikatsu!", image: "http://www.anime-planet.com/images/anime/main_image...", description: "No synopsis yet - check back soon!", eps: "TV (50 eps)", year: "2012 - 2013", rating: "2.494 out of 5 from 203 votes", created_at: "2013-10-17 07:09:41", updated_at: "2013-10-17 07:09:41">
Completed: 8 of 193
#<Series id: 117, title: "Air", image: "http://www.anime-planet.com/images/anime/main_image...", description: "The 'girl in the sky' is a legend passed down throu...", eps: "TV (13 eps)", year: "2005", rating: "3.693 out of 5 from 17,449 votes", created_at: "2013-10-17 07:09:44", updated_at: "2013-10-17 07:09:44">
Completed: 9 of 193
#<Series id: 118, title: "Air Gear", image: "http://www.anime-planet.com/images/anime/main_image...", description: "What if roller skates had high power engines that e...", eps: "TV (25 eps)", year: "2006", rating: "3.94 out of 5 from 18,133 votes", created_at: "2013-10-17 07:09:45", updated_at: "2013-10-17 07:09:45">
Completed: 10 of 193
#<Series id: 119, title: "Air Master", image: "http://www.anime-planet.com/images/anime/main_image...", description: "Aikawa Maki, a high school student (with a reputati...", eps: "TV (27 eps)", year: "2003 - 2004", rating: "2.807 out of 5 from 3,169 votes", created_at: "2013-10-17 07:09:47", updated_at: "2013-10-17 07:09:47">
Completed: 11 of 193
#<Series id: 120, title: "Aiura", image: "http://www.anime-planet.com/images/anime/main_image...", description: "Kanaka, Saki and Ayuko are three girls who, as fate...", eps: "TV (12 eps x 4 min)", year: "2013", rating: "2.793 out of 5 from 853 votes", created_at: "2013-10-17 07:09:51", updated_at: "2013-10-17 07:09:51">
Completed: 13 of 193
#<Series id: 121, title: "Akagi", image: "http://www.anime-planet.com/images/anime/main_image...", description: "One stormy night, a desperate man finds himself pla...", eps: "TV (26 eps x 24 min)", year: "2005 - 2006", rating: "3.96 out of 5 from 2,889 votes", created_at: "2013-10-17 07:09:52", updated_at: "2013-10-17 07:09:52">
Completed: 14 of 193
#<Series id: 122, title: "Akane Maniax", image: "http://www.anime-planet.com/images/anime/main_image...", description: "For Akane Suzumiya, letting go of her memories and ...", eps: "OVA (3 eps)", year: "2004 - 2005", rating: "2.143 out of 5 from 2,315 votes", created_at: "2013-10-17 07:09:54", updated_at: "2013-10-17 07:09:54">
Completed: 15 of 193
#<Series id: 123, title: "Akane-Iro ni Somaru Saka", image: "http://www.anime-planet.com/images/anime/main_image...", description: "When Junichi 'Gene Killer' Nagase saved the beautif...", eps: "TV (12 eps)", year: "2008", rating: "3.113 out of 5 from 8,793 votes", created_at: "2013-10-17 07:09:56", updated_at: "2013-10-17 07:09:56">
Completed: 16 of 193
#<Series id: 124, title: "Akane-Iro ni Somaru Saka Hardcore", image: "http://www.anime-planet.com/images/anime/main_image...", description: "Jun'ichi and his friends have been invited to vacat...", eps: "DVD Special (1 ep x 26 min)", year: "2009", rating: "2.667 out of 5 from 2,182 votes", created_at: "2013-10-17 07:09:57", updated_at: "2013-10-17 07:09:57">
Completed: 17 of 193
#<Series id: 125, title: "AKB0048", image: "http://www.anime-planet.com/images/anime/main_image...", description: "No synopsis yet - check back soon!", eps: "TV (13 eps)", year: "2012", rating: "3.181 out of 5 from 1,179 votes", created_at: "2013-10-17 07:09:58", updated_at: "2013-10-17 07:09:58">
Completed: 18 of 193
#<Series id: 126, title: "AKB0048 Next Stage", image: "http://www.anime-planet.com/images/anime/main_image...", description: "No synopsis yet - check back soon!", eps: "TV (13 eps)", year: "2013", rating: "3.924 out of 5 from 597 votes", created_at: "2013-10-17 07:10:00", updated_at: "2013-10-17 07:10:00">
Completed: 19 of 193
#<Series id: 127, title: "Akikan!", image: "http://www.anime-planet.com/images/anime/main_image...", description: "Kakeru Daichi is an openly perverted sixteen-year-o...", eps: "TV (12 eps)", year: "2009", rating: "2.48 out of 5 from 7,627 votes", created_at: "2013-10-17 07:10:01", updated_at: "2013-10-17 07:10:01">
Completed: 20 of 193
#<Series id: 128, title: "Amaenaideyo!!", image: "http://www.anime-planet.com/images/anime/main_image...", description: "Ikko is a Buddhist monk in training who works at hi...", eps: "TV (12 eps)", year: "2005", rating: "2.706 out of 5 from 7,608 votes", created_at: "2013-10-17 07:10:07", updated_at: "2013-10-17 07:10:07">
Completed: 23 of 193
#<Series id: 129, title: "Amaenaideyo!! Katsu!!", image: "http://www.anime-planet.com/images/anime/main_image...", description: "Ikko's trials as a monk in training continue, as th...", eps: "TV (12 eps)", year: "2006", rating: "3.179 out of 5 from 5,293 votes", created_at: "2013-10-17 07:10:09", updated_at: "2013-10-17 07:10:09">
Completed: 24 of 193
#<Series id: 130, title: "Amagami SS", image: "http://www.anime-planet.com/images/anime/main_image...", description: "After being stood up during a date on Christmas Eve...", eps: "TV (25 eps)", year: "2010", rating: "3.903 out of 5 from 7,386 votes", created_at: "2013-10-17 07:10:10", updated_at: "2013-10-17 07:10:10">
Completed: 25 of 193
#<Series id: 131, title: "Amatsuki", image: "http://www.anime-planet.com/images/anime/main_image...", description: "When Tokidoki Rikugou donned a pair of virtual real...", eps: "TV (13 eps)", year: "2008", rating: "3.157 out of 5 from 4,268 votes", created_at: "2013-10-17 07:10:12", updated_at: "2013-10-17 07:10:12">
Completed: 26 of 193
#<Series id: 132, title: "Amazing Nurse Nanako", image: "http://www.anime-planet.com/images/anime/main_image...", description: "Nanako works day and night to cook and clean for a ...", eps: "OVA (6 eps)", year: "1999", rating: "1.909 out of 5 from 1,454 votes", created_at: "2013-10-17 07:10:14", updated_at: "2013-10-17 07:10:14">
Completed: 27 of 193
#<Series id: 133, title: "Amnesia", image: "http://www.anime-planet.com/images/anime/main_image...", description: "No synopsis yet - check back soon!", eps: "TV (12 eps)", year: "2013", rating: "2.365 out of 5 from 2,754 votes", created_at: "2013-10-17 07:10:15", updated_at: "2013-10-17 07:10:15">
Completed: 28 of 193
#<Series id: 134, title: "Angel Beats!", image: "http://www.anime-planet.com/images/anime/main_image...", description: "Death and reincarnation are inescapable, but what h...", eps: "TV (13 eps)", year: "2010", rating: "4.495 out of 5 from 27,286 votes", created_at: "2013-10-17 07:10:17", updated_at: "2013-10-17 07:10:17">
Completed: 29 of 193
#<Series id: 135, title: "Angel Heart", image: "http://www.anime-planet.com/images/anime/main_image...", description: "A young assassin known as Glass Heart, trained in k...", eps: "TV (50 eps)", year: "2005 - 2006", rating: "3.139 out of 5 from 1,289 votes", created_at: "2013-10-17 07:10:19", updated_at: "2013-10-17 07:10:19">
Completed: 30 of 193
#<Series id: 136, title: "Angel Sanctuary", image: "http://www.anime-planet.com/images/anime/main_image...", description: "Setsuno Muda is a high school boy with a cruel fate...", eps: "OVA (3 eps)", year: "2000", rating: "2.409 out of 5 from 6,079 votes", created_at: "2013-10-17 07:10:20", updated_at: "2013-10-17 07:10:20">
Completed: 31 of 193
#<Series id: 137, title: "Angelic Layer", image: "http://www.anime-planet.com/images/anime/main_image...", description: "Misaki Suzuhara is a young girl who traveled to Tok...", eps: "TV (26 eps)", year: "2001", rating: "3.32 out of 5 from 6,426 votes", created_at: "2013-10-17 07:10:22", updated_at: "2013-10-17 07:10:22">
Completed: 32 of 193
#<Series id: 138, title: "Ano Natsu de Matteru", image: "http://www.anime-planet.com/images/anime/main_image...", description: "One day, while testing out his new video camera on ...", eps: "TV (12 eps)", year: "2012", rating: "3.993 out of 5 from 4,802 votes", created_at: "2013-10-17 07:10:27", updated_at: "2013-10-17 07:10:27">
Completed: 35 of 193
#<Series id: 139, title: "Another", image: "http://www.anime-planet.com/images/anime/main_image...", description: "26 years ago, something terrible happened in a midd...", eps: "TV (12 eps)", year: "2012", rating: "4.279 out of 5 from 12,300 votes", created_at: "2013-10-17 07:10:29", updated_at: "2013-10-17 07:10:29">
Completed: 36 of 193
#<Series id: 140, title: "Antique Bakery", image: "http://www.anime-planet.com/images/anime/main_image...", description: "Tachibana has recently quit his job at a high-class...", eps: "TV (12 eps)", year: "2008", rating: "3.189 out of 5 from 2,096 votes", created_at: "2013-10-17 07:10:31", updated_at: "2013-10-17 07:10:31">
Completed: 37 of 193
#<Series id: 141, title: "Aoi Hana", image: "http://www.anime-planet.com/images/anime/main_image...", description: "Shy, crybaby Fumi has just transferred into Matsuok...", eps: "TV (11 eps)", year: "2009", rating: "3.335 out of 5 from 2,271 votes", created_at: "2013-10-17 07:10:32", updated_at: "2013-10-17 07:10:32">
Completed: 38 of 193
#<Series id: 142, title: "Aoi Sekai no Chuushin de", image: "http://www.anime-planet.com/images/anime/main_image...", description: "For years, the kingdoms of Segua and Ninteldo have ...", eps: "TV Special (3 eps x 24 min)", year: "2012 - 2013", rating: "2.138 out of 5 from 939 votes", created_at: "2013-10-17 07:10:34", updated_at: "2013-10-17 07:10:34">
Completed: 39 of 193
#<Series id: 143, title: "Aquarian Age", image: "http://www.anime-planet.com/images/anime/main_image...", description: "Kyouta and his friends just want to rock out all da...", eps: "TV (13 eps)", year: "2002", rating: "2.32 out of 5 from 1,331 votes", created_at: "2013-10-17 07:10:35", updated_at: "2013-10-17 07:10:35">
Completed: 40 of 193
rake aborted!
undefined method `[]' for nil:NilClass
/Users/montaguemonro/Sites/animedatabase/lib/tasks/scrape.rake:45:in `block (2 levels) in <top (required)>'
/Users/montaguemonro/Sites/animedatabase/lib/tasks/scrape.rake:14:in `each'
/Users/montaguemonro/Sites/animedatabase/lib/tasks/scrape.rake:14:in `each_with_index'
/Users/montaguemonro/Sites/animedatabase/lib/tasks/scrape.rake:14:in `block in <top (required)>'
/Users/montaguemonro/.rbenv/versions/2.0.0-p247/lib/ruby/gems/2.0.0/gems/rake-10.1.0/lib/rake/task.rb:236:in `call'
/Users/montaguemonro/.rbenv/versions/2.0.0-p247/lib/ruby/gems/2.0.0/gems/rake-10.1.0/lib/rake/task.rb:236:in `block in execute'
/Users/montaguemonro/.rbenv/versions/2.0.0-p247/lib/ruby/gems/2.0.0/gems/rake-10.1.0/lib/rake/task.rb:231:in `each'
/Users/montaguemonro/.rbenv/versions/2.0.0-p247/lib/ruby/gems/2.0.0/gems/rake-10.1.0/lib/rake/task.rb:231:in `execute'
/Users/montaguemonro/.rbenv/versions/2.0.0-p247/lib/ruby/gems/2.0.0/gems/rake-10.1.0/lib/rake/task.rb:175:in `block in invoke_with_call_chain'
/Users/montaguemonro/.rbenv/versions/2.0.0-p247/lib/ruby/2.0.0/monitor.rb:211:in `mon_synchronize'
/Users/montaguemonro/.rbenv/versions/2.0.0-p247/lib/ruby/gems/2.0.0/gems/rake-10.1.0/lib/rake/task.rb:168:in `invoke_with_call_chain'
/Users/montaguemonro/.rbenv/versions/2.0.0-p247/lib/ruby/gems/2.0.0/gems/rake-10.1.0/lib/rake/task.rb:161:in `invoke'
/Users/montaguemonro/.rbenv/versions/2.0.0-p247/lib/ruby/gems/2.0.0/gems/rake-10.1.0/lib/rake/application.rb:149:in `invoke_task'
/Users/montaguemonro/.rbenv/versions/2.0.0-p247/lib/ruby/gems/2.0.0/gems/rake-10.1.0/lib/rake/application.rb:106:in `block (2 levels) in top_level'
/Users/montaguemonro/.rbenv/versions/2.0.0-p247/lib/ruby/gems/2.0.0/gems/rake-10.1.0/lib/rake/application.rb:106:in `each'
/Users/montaguemonro/.rbenv/versions/2.0.0-p247/lib/ruby/gems/2.0.0/gems/rake-10.1.0/lib/rake/application.rb:106:in `block in top_level'
/Users/montaguemonro/.rbenv/versions/2.0.0-p247/lib/ruby/gems/2.0.0/gems/rake-10.1.0/lib/rake/application.rb:115:in `run_with_threads'
/Users/montaguemonro/.rbenv/versions/2.0.0-p247/lib/ruby/gems/2.0.0/gems/rake-10.1.0/lib/rake/application.rb:100:in `top_level'
/Users/montaguemonro/.rbenv/versions/2.0.0-p247/lib/ruby/gems/2.0.0/gems/rake-10.1.0/lib/rake/application.rb:78:in `block in run'
/Users/montaguemonro/.rbenv/versions/2.0.0-p247/lib/ruby/gems/2.0.0/gems/rake-10.1.0/lib/rake/application.rb:165:in `standard_exception_handling'
/Users/montaguemonro/.rbenv/versions/2.0.0-p247/lib/ruby/gems/2.0.0/gems/rake-10.1.0/lib/rake/application.rb:75:in `run'
/Users/montaguemonro/.rbenv/versions/2.0.0-p247/lib/ruby/gems/2.0.0/gems/rake-10.1.0/bin/rake:33:in `<top (required)>'
/Users/montaguemonro/.rbenv/versions/2.0.0-p247/bin/rake:23:in `load'
/Users/montaguemonro/.rbenv/versions/2.0.0-p247/bin/rake:23:in `<main>'
Tasks: TOP => scrape
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment