Skip to content

Instantly share code, notes, and snippets.

@kaiinui
Last active March 17, 2017 19:03
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save kaiinui/7e7476aa00ac9be2c3ea to your computer and use it in GitHub Desktop.
Save kaiinui/7e7476aa00ac9be2c3ea to your computer and use it in GitHub Desktop.
CloudFlow Specification
#
# Amazon SQS Powered Multi-instance Coding (a.k.a. Workflow) & Docker powered multi-instance deployment. (Cloud Orchestration)
# similar works: Route https://github.com/jmettraux/ruote
#
# Deployment
# Deploy docker containers for each func. define their role with ENV
# TODO: how to define number of containers?
#
# flow use cloudflow.rb # => define which flow to control
# flow up
# flow scale crawl 2 # => then save to setting?
# flow scale # => number of each containers
# implement:
# when cloudflow.rb are executed, it will see ENV(CLOUDFLOW_ROLE) and find its role.
# then find func :role_name
# call sqs.poll :role_name with block in func :role_name
# lock queue.id (use CloudLock)
# when it finished, unlock queue.id
#
# crawl.retrive will be mapped to sqs.push with SQS name and Object mapping
# ISSUES:
# 1. on deployment, how to make sure not to lose processes?
# => don't delete queue until the work block get finished. then losing processes will not be a problem.
# 2. global(class) variable support.
# => just passes all class variable for each work.
# 3. Promise support
# => hogehoge arg {|data| called_when_it_gets_done}
# => open queue '__:rolename_done' and poll it.
# 4. support Promise with array arg
# => hogehoge [arg] {|data| called_when_it_gets_done}
# => how to receive returned_arg? use SQS
# 5. [CLOSE] how to define each machine's role on docker deployment?
# => see ENV, ENV['CLOUDFLOW_ROLE'] will be.
# 6. allow different Gemfile, because require all Gem/Class in all machines are inefficient
# => is it needed? if it's sure that all machines are equal, it will be more simple. or provide `require_if :role_name`
# 7. [CLOSE] how to restrict passed args are appropriate?
# => func :scrape, url: String, page: Page do |arg|
# => then check arg simply.
# 8. [IDEA] provide monit
# 9. passing arg via SQS, how to pass Class instance? ex. arg.page?
# => don't pass non-primitive type. restrict types. String, Integer,
# 10. [IDEA] how to controll number of each worker?
# => Paxos, seeing number of queue
# => Monit worker will? SPOF
# 11. [IDEA] Handle ActiveModel object transparency
# => when object which is a ActiveModel(is_a?) are passed, it will pass #id instead of whole class. then do Model.find(id) when arg received
# 12. Defining S3 access schema. or just save it onto Model?
# 13. CoreOS etcd support.
# 14. to prevent duplicate message, lock will be needed.
require 'cloudflow'
require_relative 'workers' # => Crawler, Scraper, Fetcher, Converter
# namespace
# #TODO is it necessary?, or just Class Crawler < CloudFlow::Base?
namespace :crawl do
func :retrieve do |arg| # TODO: restrict arg schema with JSON schema? or just def
data = Crawler.crawl arg.url # how to define S3 access schema?
page = Page.new data
page.save
crawl.retrieve data.urls.map {|url| {url: url}} # Array
crawl.scrape url: arg.url, page: page
end
func :scrape do |arg|
data = Scraper.scrape arg.url
arg.page.update_attributes! data
crawl.fetch_image urls: data.image_urls do |returned_arg|
# when it finished
arg.page.update_attribute! "ready", true
end
end
func :fetch_image do |arg|
Fetcher.fetch arg.url
crawl.convert_image url: arg.url
end
func :convert_image do |arg|
Converter.convert url: url
end
end
# DSL to define structure
# CloudFlow will run docker via SSH
repository 'kaiinui/cloudflow'
ssh_key '~/.ssh/id_rsa'
user 'cloudflow'
aws_key './aws.yml' # {:ACCESS_KEY, :SECRECT_ACCESS_KEY, :REGION}
machine_pool ["192.0.0.1", "192.0.0.2", "192.0.0.3"]
@kaiinui
Copy link
Author

kaiinui commented Jun 11, 2014

writing Test

just test I/O for each func. (so calling func async for test is needed)
or E2E also capable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment