We need to reliably handle new, dynamic data that is generated on a continual basis and present it in real-time. A scooter's location could change in a matter of seconds - we need to be able to detect that change with minimal latency in order to provide accurate asset tracking.
Time gives our data meaning. Therefore, the raw data needs to be processed sequentially and incrementally over sliding time windows.
Given the architecture diagram above:
- Source data is written to Kinesis Data Firehose stream
- Kinesis Data Firehose invokes Lambda function to transform incoming source data
- Kinesis Data Firehose streams transformed data to S3 bucket at a predefined buffer size (megabytes) and interval (seconds)
- AWS Database Migration Service (DMS) reads data from source S3 bucket
- DMS loads the data into a target database (Amazon DynamoDB)
- DynamoDB pushes data to consumers (our application)
Note that step 1 is subject to change given the structure of vendors' data
Function: captures and automatically loads streaming data into S3 bucket
Why: enables the capture and load of source data in near real-time
Function: transforms incoming source data into specified formats
Why: our application expects a uniform data format
Function: serves as a buffer for incoming streaming data from Kinesis
Why: kinesis concatenates multiple incoming records and adds a UTC time prefix in the format YYYY/MM/DD/HH
before writing to S3. because the forward slash (/) creates a level in the S3 hierarchy, DMS can correctly find and transport the correct group data as specified by our buffering configuration
Function: reads data from source S3 bucket and loads them into the target database
Why: transportation of transformed source data
Function: serves as the target database to store data
Why: provides the scale and performance demanded by our application