Skip to content

Instantly share code, notes, and snippets.

@affix
Last active October 14, 2016 10:29
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save affix/da98f6f72c2abaddd6add8aa99f33be4 to your computer and use it in GitHub Desktop.
Save affix/da98f6f72c2abaddd6add8aa99f33be4 to your computer and use it in GitHub Desktop.
A simple Pirate Bay Scraper that only scrapes a single page
#!/usr/bin/env ruby
# Simple Pirate Bay Parser
# Checks the results page and lists torrents and info
# Also gives Magnet link
# (c) 2016 Keiran Smith
# Licensed under gnu/gplv3
require 'nokogiri'
require 'open-uri'
root_url = "https://piratebaymirror.eu"
html = open("#{root_url}/recent")
doc = Nokogiri::HTML(html)
doc.encoding = 'utf-8'
rows = doc.xpath('//table[@id="searchResult"]/tr')
def sanitize(data)
data.gsub("Details for ", "")
end
start = 0
rows.each do |row|
dataset = row.xpath('//tr/td/div[@class="detName"]')
dataset.each do | data |
link = data.css('a').first
if link
puts "#{link.content} :: #{link["href"]}"
page = nil
begin
page = open("#{root_url}#{link["href"]}")
rescue URI::InvalidURIError
# URI Had an invalid char, Trying alternate, slow method
split = link["href"].split('/')
link = "/#{split[1]}/#{split[2]}"
page = open("#{root_url}#{link}")
end
pdoc = Nokogiri::HTML(page)
pdoc.encoding = 'utf-8'
c1 = pdoc.xpath('//dl[@class="col1"]/dd')
puts "Category : #{c1[0].content}"
puts "Size : #{c1[2].content}"
nfo = pdoc.xpath('//div[@class="nfo"]/pre')
puts nfo.first.content
magnet = pdoc.xpath('//div[@class="download"]')
mlink = magnet.css('a').first
puts "Magnet Link : #{mlink["href"]}"
puts "--"
start += 3
end
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment