Skip to content

Instantly share code, notes, and snippets.

@milk1000cc
Last active March 5, 2023 07:57
Show Gist options
  • Save milk1000cc/469915480238aa774db9e1765bcb5142 to your computer and use it in GitHub Desktop.
Save milk1000cc/469915480238aa774db9e1765bcb5142 to your computer and use it in GitHub Desktop.
use vessel as an alternative to kimurai: https://github.com/rubycdp/vessel
require 'bundler/inline'
gemfile do
source 'https://rubygems.org'
gem 'vessel', github: 'rubycdp/vessel' # commit 3097da3daeb2b0f06182b2d4faa7693d82407538
end
class FirstMiddleware < Vessel::Middleware
def call(item, _)
item[:h1] += '!'
puts item[:h1]
item
end
end
class SecondMiddleware < Vessel::Middleware
def call(item, _)
puts item[:h1] + '!'
item
end
end
class ApplicationCrawler < Vessel::Cargo
delay 1
threads max: 1
middleware 'FirstMiddleware', 'SecondMiddleware'
end
class ExampleCrawler < ApplicationCrawler
start_urls 'https://example.com/'
def parse
yield({ h1: at_css('h1').text })
end
def parse2
p data
end
end
# run (kimurai: crawl!)
ExampleCrawler.run
# parse (kimurai: parse!)
engine = Vessel::Engine.new(ExampleCrawler)
engine.parse 'https://example.com/2', :parse2, { foo: :bar }
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment