Skip to content

Instantly share code, notes, and snippets.

@czue
Created March 28, 2016 08:30
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save czue/a23724200b70fe7177ae to your computer and use it in GitHub Desktop.
Save czue/a23724200b70fe7177ae to your computer and use it in GitHub Desktop.

Kafka Migration

Problem statement

We need to migrate Kafka from one machine to the other, ideally with 0 downtime.

Planned solution

  1. Spin up the new kafka environment
  2. Publish changes from production to both environment in parallel
  3. Confirm new environment is working
  4. Flip a pointer for the feeds that run off kafka
  5. Stop publishing to old kafka environment
  6. Remove old kafka entirely

Code changes

Publishing to multiple kafkas

  • The ChangeFeedPillow and user_groups_db_kafka_pillow either need to be duplicated or updated to publish to multiple kafkas
  • This likely means that get_kafka_client_or_none (and get_kafka_client) need to support passing in a client ID or something, similar to the distinction between get_db() and get_db_for_doc_type()
  • You can leave the pillows ~untouched if you just figure out a way to get the list of kafkas into the processor and then update this line to send changes to all kafkas instead of just one.
  • For the exact purposes of this change, a get_all_kafkas() utility function could be the easiest way to transition and (for now) we just treat them as peers.
  • It would be good if this is configurable via some kind of settings dict, since we don’t want to publish to multiple kafkas on all environments

Settings changes

Something like the following could work, where any time someone gets a feed that isn't in the the settings it just returns the default:

KAFKAS = { 
    'default': 'kafka0.internal.commcarehq.org:9092',
    'migration': 'kafka1.internal.commcarehq.org:9092'
}

Reading from new feeds

  • The KafkaChangeFeed class will need to be update so that _get_consumer isn't hard-coded to pull from the main kafka.
  • The pillows that use KafkaChangeFeed (e.g. the UCR pillow) will need to be updated to properly initialize the change feed with enough information to fetch the right kafka.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment