kevherro/digital-unity-data-architecture.md

## digital-unity-data-architecture.md

      
    Raw
  

              digital-unity-data-architecture.md
            
          
    Objective

We need to reliably handle new, dynamic data that is generated on a continual basis and present it in real-time. A scooter's location could change in a matter of seconds - we need to be able to detect that change with minimal latency in order to provide accurate asset tracking.
Architecture diagram


High-level architecture overview

Time gives our data meaning. Therefore, the raw data needs to be processed sequentially and incrementally over sliding time windows.
Given the architecture diagram above:

Source data is written to Kinesis Data Firehose stream
Kinesis Data Firehose invokes Lambda function to transform incoming source data
Kinesis Data Firehose streams transformed data to S3 bucket at a predefined buffer size (megabytes) and interval (seconds)
AWS Database Migration Service (DMS) reads data from source S3 bucket
DMS loads the data into a target database (Amazon DynamoDB)
DynamoDB pushes data to consumers (our application)


Note that step 1 is subject to change given the structure of vendors' data

Services used

Amazon Kinesis Firehose

Function: captures and automatically loads streaming data into S3 bucket
Why: enables the capture and load of source data in near real-time
Amazon Lambda

Function: transforms incoming source data into specified formats
Why: our application expects a uniform data format
Amazon S3

Function: serves as a buffer for incoming streaming data from Kinesis
Why: kinesis concatenates multiple incoming records and adds a UTC time prefix in the format YYYY/MM/DD/HH before writing to S3. because the forward slash (/) creates a level in the S3 hierarchy, DMS can correctly find and transport the correct group data as specified by our buffering configuration
AWS Migration Service

Function: reads data from source S3 bucket and loads them into the target database
Why: transportation of transformed source data
Amazon DynamoDB

Function: serves as the target database to store data
Why: provides the scale and performance demanded by our application
Resources


What is streaming data?
Amazon Kinesis Firehose
Amazon Lambda
Amazon S3
Amazon DynamoDB
Amazon Migration Service
High performance data streaming with Amazon Kinesis: Best Practices
Lyft case study