Skip to content

Instantly share code, notes, and snippets.

@nikkaroraa
Created March 5, 2021 12:26
Show Gist options
  • Save nikkaroraa/805dbe203040494ebe5a4c29cff1e2cd to your computer and use it in GitHub Desktop.
Save nikkaroraa/805dbe203040494ebe5a4c29cff1e2cd to your computer and use it in GitHub Desktop.
AWS DynamoDb
  • keys

  • provisioned throughput

    • Overview

      • RCU (Read Capacity Units) & WCU (Write Capacity Units)
      • Tables must have provisioned read and write capacity units
      • Read Capacity Units (RCU): throughput for reads
      • Write Capacity Units (WCU) : throughput for writes
      • Option to setup auto-scaling of throughput to meet demand
      • Throughput can be exceeded temporarily using "burst credit"
      • If burst credit are empty, you'll get a "ProvisionedThroughputException"
      • It's then advised to do an exponential back-off retry
    • Write Capacity Units

      • One write capacity unit represents one write per second for an item up to 1 KB in size
      • If the items are larger than 1 KB, more WCU are consumed

      Example: we write 10 objects per second of 2 KB each.

      • We need 2*10 = 20 WCU
    • Strongly Consistent Read vs Eventually Consistent Read

      • Eventually Consistent Read: If we read just after a write, it's possible we'll get unexpected response because of replication

      • Strongly Consistent Read: If we read just after a write, we'll get the correct data.

      • By default: DynamoDb uses Eventually Consistent Reads, but GetItem, Query & Scan provide a "ConsistentRead" parameter that you can set to True.

    • Read Capacity Units

      • One read capacity unit represents one strongly consistent read per second, or 2 eventually consistent reads per second, for an item upto 4 KB in size.
  • Basic APIs

    • Writing Data

      • PutItem: Write data to DynamoDb (create data or full replace)
      • UpdateItem - Update data in DynamoDb (partial update of attributes)
      • Atomic counters: You can use the UpdateItem operation to implement an atomic counter — a numeric attribute that is incremented, unconditionally, without interfering with other write requests.
      • Conditional Writes: Accept a write / update only if conditions are respected, otherwise reject.
    • DeleteItem

      • Delete an individual row
      • Ability to perform a conditional delete
    • DeleteTable

      • Delete a whole table and all its items
      • Much quicker deletion than calling DeleteItem on all items.
    • Batching Writes

      • BatchWriteItem

        • Up to 25 PutItem and / or DeleteItem in one call
        • Up to 16 MB of data written
        • Up to 400 KB of data per item
      • Batching allows you to save in latency by reducing the number of API calls done against DynamoDb

      • Operations are done in parallel for better efficiency

      • It's possible for part of a batch to fail, in which case we have the try the failed items (using exponential back-off algorithm)

    • Reading Data

      • GetItem

        • Read based on Primary key
        • Primary Key = HASH or HASH-RANGE
        • Eventually consistent read by default
        • Option to use strongly consistent reads (more RCU - might take longer)
        • ProjectionExpression can be specified to include only certain attributes
      • BatchGetItem

        • Up to 100 items
        • Up to 16 MB of data
        • Items are retrieved in parallel to minimize latency
    • Query

      • Query returns items based on:

        • PartitionKey value (must be "=" operator)
        • SortKey value (=, <, <=, >, >=, Between, Begin) - optional
        • FilterExpression to further filter (client side filtering)
      • Returns:

        • Up to 1 MB of data
        • Or number of items specified in Limit
      • Able to do pagination on the results.

      • Can query table, a local secondary index, or a global secondary index.

    • Scan

      • Scan the entire table and then filter out data (inefficient)
      • Returns up to 1 MB of data - use pagination to keep on reading
      • Consumes a lot of RCU
      • Limit impact using Limit or reduce the size of the result and pause
      • For faster performance, use parallel scans:
        • Multiple instances scan multiple partitions at the same time
        • Increases the throughput and RCU consumed
        • Limit the impact of parallel scans just like you would for Scans
      • Can use a ProjectionExpression + FilterExpression (no change to RCU)
  • Indexes (GSI + LSI)

    • LSI: Local Secondary Indexes still rely on the original Hash Key. When you supply a table with hash+range, think about the LSI as hash+range1, hash+range2.. hash+range6. You get 5 more range attributes to query on. Also, there is only one provisioned throughput.
    • GSI: Global Secondary Indexes defines a new paradigm - different hash/range keys per index. This breaks the original usage of one hash key per table. This is also why when defining GSI you are required to add a provisioned throughput per index and pay for it.
  • DynamoDb Concurrency

    • DynamoDb has a feature called "Conditional Update / Delete"
    • That means that you can ensure an item hasn't changed before altering it.
    • That makes DynamoDb an optimistic locking / concurrency database
  • DynamoDb TTL (Time to Live)

    • TTL = automatically delete an item after an expiry date / time
    • TTL is provided at no extra cost, deletions do not use WCU / RCU
    • TTL is a background task operated by the DynamoDb service itself
    • Helps reduce storage and manage the table size over time
    • Helps adhere to regulatory norms
    • TTL is enabled per row (you define a TTL column, and add a date there)
    • DynamoDb typically deletes expired items within 48 hours of expiration
    • Deleted items due to TTL are also deleted in GSI / LSI
    • DynamoDb Streams can help recover expired items.
  • DynamoDb CLI - Good to KNow

    • --projection-expression: attributes to receive

    • --filter-expression: filter results

    • General CLI pagination options including DynamoDb / S3:

      • Optimization:
        • --page-size: full dataset is still received but each API call will request less data (helps avoid timeout)
      • Pagination:
        • --max-items: max number of results returned by the CLI. Returns NextToken
        • --starting-token: specify the last received NextToken to keep on reading.
  • DynamoDb Transactions

    • Transaction = Ability to Create / Update / Delete multiple rows in different tables at the same time.
    • It's an "all or nothing" type of operation.
    • Write Modes: Standard, Transactional
    • Read Modes: Eventual Consistency, Strong Consistency, Transactional
    • Consume 2x of WCU / RCU
  • DynamoDb as Session State Cache

    • It's common to use DynamoDb to store session state.
    • vs ElastiCache:
      • ElastiCache is in-memory, but DynamoDb is serverless
      • Both are key / value stores
    • vs EFS:
      • EFS must be attached to EC2 instances as a network drive
    • vs EBS & Instance Store:
      • EBS & Instance Store can only be used for local caching, not shared caching.
    • vs S3:
      • S3 is higher latency, and not meant for small objects.
  • DynamoDb Write Sharding

    • Imagine we have a voting application with 2 candidates, candidate A and candidate B.
    • If we use a partition key of candidate_id, we will run into partition issues, as we only have 2 partitions.
    • Solution: add a suffix (usually random suffic, sometimes calculated suffix).
  • DynamoDb - Write Types

    • Concurrent Writes
    • Conditional Writes
    • Atomic Writes
  • DynamoDb - Large Objects Pattern

  • DynamoDb Operations

    • Table Cleanup

      • Option 1: Scan + Delete = very slow, expensive, consumes RCU & WCU
      • Option 2: Drop Table + Recreate Table = fast, cheap, efficient
    • Copying a DynamoDb Table:

      • Option 1: Use AWS DataPipeline (uses EMR)
      • Option 2: Create a backup and restore the backup into a new table name (can take some time)
      • Option 3: Scan + Write => write own code
  • DynamoDb - Security & Other features

    • Security:
      • VPC Endpoints available to access DynamoDb without internet
      • Access fully controlled by IAM
      • Encryption at rest using KMS
      • Encryption in transit using SSL / TLS
    • Backup and Restore feature available
      • Point in time restore like RDS
      • No performance impact
    • Global Tables
      • Multi-region, fully replicated, high performance
    • Amazon DMS can be used to migrate to DynamoDb (from Mongo, Oracle, MySQL, S3, etc...)
    • You can launch a local DynamoDb on your computer for development purposes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment