- Kinesis is streaming data management and analytics
- Ingestion through Kinesis Steams & Kinesis Firehose
- Analysis through Kinesis Data Analytics
- Kinesis Streams
- Stream management system
- Capacity measured in shards of 1MB/s read and 2MB/s write
- Producers write data into stream
- Can be EC2 instance, application, server, IOT, whatever
- Consumers receive the data
- Can be EC2, Lambda, etc.
- Each consumer consumes a particular shard
- Consumers can store data, aggregated data, results, whatever they want in other services (S3, Redshift, etc.)
- Kinesis Firehose
- Stream buffering/concatenation/transformation is permitted!
- Loads data into Redshift/S3/ElasticSearch/Splunk in near-realtime
- Can load into Streams or Firehose via:
- HTTPS PUT commands
- Kinesis Producer library (within code)
- Kinesis Agent (log file monitoring)
- Also handles rotation, retry, etc.
- Data can be encrypted at all times with KMS
Streams | Firehose |
---|---|
Customizable but more complicated | Easy and simple |
Ideal for building custom analysis applications | Ideal for dumping into storage for third party analysis |
Shards must be provisioned by customer | Service autoscales to meet demand |
Data available to consumers in subseconds | Streamed data available within ~60s |
No pre-manipulation of the data | Data transformation & batching permitted (via lambda) |
- Kinesis Analytics
- Query data in real time with SQL
- Store output as a different stream, to S3/ES/etc.
- Interactive query tool for S3
- Uses Presto (distributed SQL engine) to query, Apache Hive to do schema manipulation
- Allows projection of a schema at read time
- Since data is in S3, no need to load or aggregate data -- a bolt-on query engine
- Can query encrypted S3 objects
- Data protected with TLS/HTTPS in flight