Skip to content

Instantly share code, notes, and snippets.

@skatkov
Created August 14, 2012 00:49
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save skatkov/3345223 to your computer and use it in GitHub Desktop.
Save skatkov/3345223 to your computer and use it in GitHub Desktop.
Parsing ads from bahtsold.com
#coding: utf-8
require 'wombat'
class BahtsoldCrawler
include Wombat::Crawler
base_url "http://www.bahtsold.com"
path "/en/results?category=7&ad_type=NULL&ar1=1006&c1=377&c2=NULL&6=NULL&price=2&co=Thailand-1"
document_format :html
application 'css=div.do_cat_ads_box2:not([style]), div.do_cat_ads_box:not([style])', :iterator do
href 'css=div.do_cat_ads_image a @href'
name 'css=div.do_cat_ads_detail a'
end
end
#!/usr/bin/env ruby
require 'test/unit'
require './bahtsold'
class BahtsoldScrapperTest < Test::Unit::TestCase
def setup
@bahtsold = BahtsoldCrawler.new.crawl
end
def test_get_text
puts @bahtsold["application"]
end
def test_required_keys
@bahtsold["application"].each{|test1| assert_equal test1.keys, ["href", "name"]}
end
def test_check_no_nil
@bahtsold["application"].each{|hash| assert_not_nil hash["href"]}
end
def test_ads_number
assert_equal 32, @bahtsold["application"].length
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment