Skip to content

Instantly share code, notes, and snippets.

@WhoAteDaCake
Created June 4, 2019 08:43
Show Gist options
  • Save WhoAteDaCake/eb4712f0bdff1e966df0c9f0ba4eba92 to your computer and use it in GitHub Desktop.
Save WhoAteDaCake/eb4712f0bdff1e966df0c9f0ba4eba92 to your computer and use it in GitHub Desktop.
Processing pipeline

Overview

Processor

  • Individual servers that perform the tasks
  • Endpoints
    • GET /meta
      • Provides resource usage as well as name and id
    • POST /process
  • Should register to the manager

Manager

Split into 3 parts Uses redis database to store meta data

Handler

  • Endpoints
    • POST /register
      • Register a new processor
      • Should pass a (name and id)
      • On register
        • Handler will ping server back
          • /meta to validate name and id
          • (maybe) ping to validate /process endpoint
    • GET /meta
      • Resource usage of handler
      • List of process data:
        • Name, Id, usage stats, Average times taken, items processed, connected on.
    • POST /action
      • Create a new action that will be put in the queue for processors to execute
      • Usually is array of jobs together with data entry
      • The data item with 'reduced' over processors
    • POST /complete
      • Processor would make this request to signify the end of a job
      • Item will be :
        • If has any processors left -> put back in pending queue
        • Otherwise to completion queue
Scheduler

Uses FIFO scheduling (maybe updated in the future)

  • Handles input queue
    • Sends out sends out items from queue to processors
    • Puts the item into pending queue
  • Handles pending queue

  • Handles output queue
    • Sends reponses back combined with action id
    • Also calculates average processor execution times

TODO: kick of queue if takes too long ?

Healthcare
  • Runs periodic health checks in order to make sure the plugins are still working
    • Update resource usage
  • Resends failed processes

How it works

Each processor is provided with a url of a manager. As soon as it starts, it will register itself with a manager

Database

Could potentially be any database in the future, but I chose to use redis

  • Processors meta
    • url
    • name
    • id
    • memory usage
    • cpu usage
    • last average time
    • processed item count
    • connected on
  • Actions queue
    • action id
    • response url
  • Input queue
    • Items here are waiting their turn to be processed
    • Contains (actionid, list of processor id, data)
  • Pending queue
    • Items here have been sent to
  • Output queue
    • Items here are ready to be sent back to the issuers
    • Containes (action id, data)
  • Failed queue (FUTURE ADDITION)
    • Contains (actionid, list of processor id, data)

TODO

  • Dealing with failure ?
    • Should action allow to specify whether partial failure is allowed ?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment