Changeset is received via osm-metronome -> dynamosm
dynamosm-validation takes the changeset and:
- updates the scoreboards
- cross-updates the scoreboards
Report
{
"reason": "User account is less than 10 days old",
// computer-readable id, used for batch changes like if we want to remove a rule
"reason_id": "qa/NEW_ACCOUNT",
"karma": -0.1,
"source": "user/200350",
"target": "changeset/4242",
"report_id": "521421"
}
Comment
{
"reason": "User account is less than 10 days old",
"reason_id": "human/COMMENT",
"karma": 0.1,
"message": "This is a perfectly valid edit",
"target": "changeset/4242",
"report_id": "521422"
}
DynamoDB
Changeset table
PRIMARY | SECONDARY | SECONDARY | | |
changeset | user | reason_id | report_id | karma | message
---------------------------------------------------------------------------------------
4242 | 200350 | qa/NEW_ACCOUNT | 521421 | -0.1 |
4242 | 200350 | human/COMMENT | 521422 | 0 | This is a perfectly valid edit
Changeset rollup table
PRIMARY | SECONDARY | SECONDARY |
changeset | user | karma |
------------------------------------
4242 | 200350 | -0.1 |
Feature table
PRIMARY | SECONDARY | SECONDARY | SECONDARY | | |
feature | user | changeset | reason_id | report_id | karma |
------------------------------------------------------------------------
324242 | 200350 | 4242 | qa/SELF_INTERSECT | 521425 | -0.4 |
Reason table
PRIMARY | SECONDARY | | |
reason_id | target | karma | success |
---------------------------------------------------
qa/SELF_INTERSECT | feature/324242 | -0.4 | false |
This is really disaggregated schema. Tables are managed independently, so that's the primary disadvantage to running multiple tables. In some tables its okay to track multiple types of objects and use some aspect of the compound key to differentiate, but that can also lead to other problems. So my first question is whether or not you can ball these things up into more coherent "documents" with relationships defined in the doc. The upshot there is that UpdateItem requests can reach in and adjust individual attributes within a document, and Query and Scan requests can use arbitrary attribute filters -- you get 5 GSIs that cost less to query, but server-side filtering without an index can still be pretty effective.
It usually comes down to defining what queries you want to satisfy. It might help a lot to write down all the questions you want to answer.
Just spitballing, what happens if you model the data as a single JSON object:
If you stick with this disaggregate schema, some of your example tables don't appear to have unique compound keys -- for example what's the unique pair of keys in the changeset table?