-
AWS Kinesis is a managed data streaming/processing API
-
It's most commonly compared to Kafka, linkedin's open source data processing system, with the obvious difference being that all infrastructure is managed
-
The big benefit of Kinesis is the managed infrastructure. Scaling is made simple through the concept of shards.
-
For each shard, limit of 1MB/second input and 2MB/second output
-
It's also possible to add enhanced fan-out consumers, each with their own 2MB/second output limit
-
Standard data retention is 24 hours but can be increased up to 7 days. Since kinesis is replayable, this data is accessible as needed throughout the retention period.
-
Each record contains a partition key which, when hashed, determines the shard that record is sent to
-
From a messaging system point of view, the benefits are that messages will be processed in order, with the potential for massive scalability and near real time processing (around 1 second from client to consumer in our testing).
The remaining difficulty with Kinesis is in the consumer management.
- For each shard, 1 consumer is needed
- Each consumer then needs to keep track of it's position in the stream, requiring some amount of state management
- Then, when a shard is split, in order to maintain ordered processing, the records from the old shard must all be processed before the 2 new shards can begin.
- This requires additional orchestration around splitting/merging shards and stopping/starting consumers
However, AWS also provides a Kinesis Lambda trigger, this manages the entire orchestration process:
- There will be max 1 concurrent lambda invocation per shard.
- Each time the lambda runs (and successfully finishes), the position in the stream is saved
- If the lambda fails, it will be re-run from the same starting position
- Lambdas are invoked every second if records are available
- If a shard is split, a lambda will be invoked for the old shard until it is completely processed and only then will the invocations for the 2 new shards be run.
Things that Kinesis is more suitable for:
- MapReduce style operations - aggregation, counting, etc
- Ordered messaging (possible through FIFO queues)
- Multiple consumers (possible through SNS-SQS fanout)
- Delayed consumption
Things that SQS is more suitable for:
- Message level ack/fail
- Message tracking without any additional state management
- Delayed sends
- Data with the potential for spikes etc. (dynamic read-time scaling)
SQS charges per message (each 64 KB counts as one request). Kinesis charges per shard per hour (1 shard can handle up to 1000 messages or 1 MB/second) and also for the amount of data you put in (every 25 KB).
Sending 1 GB of messages per day at the maximum message size - Kinesis ~10/month > SQS ~0.20/month Sending 1 TB of messages per day at the maximum message size - Kinesis ~158/month < SQS 201/month
SQS charges $0.40 per million requests (64 KB each), so $0.00655 per GB. At 1 GB per day, this is just under $0.20 per month; at 1 TB per day, it comes to a little over $201 per month.
Kinesis charges $0.014 per million requests (25 KB each), so $0.00059 per GB. At 1 GB per day, this is less than $0.02 per month; at 1 TB per day, it is about $18 per month. However, Kinesis also charges $0.015 per shard-hour. You need at least 1 shard per 1 MB per second. At 1 GB per day, 1 shard will be plenty, so that will add another $0.36 per day, for a total cost of $10.82 per month. At 1 TB per day, you will need at least 13 shards, which adds another $4.68 per day, for a total cost of $158 per month.
- Clients hit a rest endpoint (looking at changing this to a websocket)
- Endpoint pushes event record onto a kinesis stream
- Consumer reads records off kinesis stream
- From event record, load user state, modify user state, save user state (dynamodb)
- state includes processedTimestamp to stop duplication of events
- also includes openActivities and completedActivities
- Every time state gets saved to dynamodb, another lambda gets called
- lambda checks for completed activities and sends them to SB
- lambda sends a 'heartbeat' activity to a delayed SQS queue
- Activity is pulled off the delayed SQS queue and converted to a Kinesis record