Skip to content

Instantly share code, notes, and snippets.

@jdecode
Last active September 4, 2020 11:22
Show Gist options
  • Save jdecode/2e9a6665a21d32da8dd28bff2ef84726 to your computer and use it in GitHub Desktop.
Save jdecode/2e9a6665a21d32da8dd28bff2ef84726 to your computer and use it in GitHub Desktop.
DynamoDB data modelling for dapier/admino - first time DynamoDB/NoSQL, expect big and stupid mistakes

The more I watch Rick's sessions from 2017, 2018 and 2019, more confused I get - so I guess I'd write it down.

There are 3 core steps (some have more, I want to stick to 3) to create a decent model that works well:

  1. Understand the usecase + create ERD(list entities and relations)
  2. Identify the access patterns - R/W workloads, query dimensions and aggregations
  3. Data modeling - avoid relational patterns, use 1 table(if there aren't any "documents", 1 should be fine)
  4. R.R.R = Review > Repeat > Review (go on till it makes sense)
@jdecode
Copy link
Author

jdecode commented Jul 17, 2020

Dapier / Admino = DB viewer (like Adminer / phpMyAdmin/pgAdmin etc) but running online (obviously it would work if the DB connection is allowed to connect from non-local clients) - anyway, that's not these notes are about.

These notes are about creating the table structure in DynamoDB.

The very first odd thing I noticed when I first logged in to DynamoDB web UI was that there is no option to "Create Database" - there is an option to "Create Table", and this bothered me. I was like "Why wouldn't you have a 'Database'?, How would I 'categorise' my tables?".

And the next thing I start reading is that you should have "One" (1) table - this is when I stopped fighting my own thoughts.
If 90% of the services of AWS/Amazon are using DynamoDB, and it is running pretty fast, then there must be some sense in what is being said.
The reason for NOT having Database is due to the fact that the DynamoDB mindset would want to keep a single table, unless a unique edge-case shows up and requires a separate "table". One such example is that if documents are stored in the table (say 10 MB per row, for say 10% of the matched records), then it would break/cross the limit on data that could be read from the DB in a single request. In such a scenario, the "hash" (akin to primary-key) would be saved in DB, and there could be (could be not) a chance to store reference of this primary-key in another table as FK (though it could still be done in the same table).

@jdecode
Copy link
Author

jdecode commented Jul 17, 2020

So let's start with the first key action:

Understand the use case + create ERD (list entities and relations)

@jdecode
Copy link
Author

jdecode commented Jul 18, 2020

Before creating all possible DB tables (and entities) and creating the relationships amongst those, I started off with a small list of entities = 4.

There would not be any "Manual" sign-up/login, rather the login/signup would be through GitHub (initially) and other providers would be added (OAuth2 primarily).

With each new "Login using GitHub" a new session would be created, and the client/browser/mobile will receive a "token", using which further requests will be processed.

@jdecode
Copy link
Author

jdecode commented Jul 24, 2020

Entity 1 = User
Entity 2 = Token

User (Entity 1 / E1) will have multiple Tokens (Entity 2 / E2)

@jdecode
Copy link
Author

jdecode commented Jul 24, 2020

Entity 3 = Connection

A connection is an entity that:

  • Would be created by a user
  • Could be shared with other user
  • Multiple users can have different type of access (when shared) : read-only, read+write, can-share-with-others(with permissions max or less than self), owner
  • Would have a creator (and the user with "owner" role, by default)
  • Could have multiple owners
  • Could be created by a user which can have no access to the connection in future : User A (creator/owner) makes User B owner (now there are 2 owners), and then User B removes User A's access from the connection; now User A (despite still being the creator) would lose access to the connection, and User B would be the owner(not creator - that info still refers to User A - but it doesn't matter)

@jdecode
Copy link
Author

jdecode commented Jul 24, 2020

Entity 4 = Permission (of a user, on a connection)

As explained above about connections, the info pertaining to the "creator" of a connection will always point to the user who created any given connection), however the access - or "permission" - on a connection (of a user) would define which connections to show for a user, and this info in itself would be a different "Entity" in the system.

@jdecode
Copy link
Author

jdecode commented Jul 24, 2020

Entity 1 = User
Entity 2 = Token
Entity 3 = Connection
Entity 4 = Permission

@jdecode
Copy link
Author

jdecode commented Jul 24, 2020

As per my limited understanding of JWTs, it might be a better fit in this scenario, and can potentially prevent the need for "Token" completely (or in a very limited capacity).

Time to read/learn more about JWTs - BRB...

@jdecode
Copy link
Author

jdecode commented Jul 26, 2020

JWT = JSON Web Token

JWTs would work!

@jdecode
Copy link
Author

jdecode commented Jul 26, 2020

Temporary note - for upcoming devTalks session on August 8th, 2020 - on the topic "DynamoDB"!

Create presentation in Prezi!

@jdecode
Copy link
Author

jdecode commented Aug 24, 2020

Continuing from JWT - Yes, it should work!

@jdecode
Copy link
Author

jdecode commented Aug 24, 2020

As a concept (soon to be tried), following is the scenario that I am trying to implement:

  1. After social login (via GitHub only, ATM) create a JWT token using the email received from GitHub (using access_token)
  2. Send the JWT token (the long string) as a response to the client/browser upon successful logging in
  3. Expect a header with JWT token for authorised requests and check for authenticity of the token, and if found authentic, then use the given email as a valid identifier and serve the request

@jdecode
Copy link
Author

jdecode commented Aug 24, 2020

Considering the usage of JWT, the need for any other token for authorisation is deemed unnecessary and hence the following set of entities is reduced from:

Entity 1 = User
Entity 2 = Token
Entity 3 = Connection
Entity 4 = Permission

to

Entity 1 = User
Entity 2 = Token
Entity 3 = Connection
Entity 4 = Permission

@jdecode
Copy link
Author

jdecode commented Aug 24, 2020

New entity set identified

Entity 1 = User
Entity 2 = Connection
Entity 3 = Permission

From a database structure POV, the change now is that the primary key would be the "email" of the logged in user (or rather the "social email")

@jdecode
Copy link
Author

jdecode commented Aug 24, 2020

User will have many connections

User will have many permissions

Connection will have many permissions

@jdecode
Copy link
Author

jdecode commented Aug 24, 2020

From DynamoDB POV, the PK/SK structure would look something like this:

For "users":
PK = USER_{email-goes-here}
SK = TRUE or FALSE (depicting "active" status)
meta = JSON

For "connections"
PK = CONN_{ID-goes-here}
SK = TRUE or FALSE (depicting "active" status)
meta = JSON

For "permissions"
PK = PERM_{email-goes-here}
SK = {connection-ID-goes-here}
meta = JSON

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment