Skip to content

Instantly share code, notes, and snippets.

@tmcw
Last active April 13, 2016 09:27
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save tmcw/f4a55eabe67bb692e751a02fef3ba24b to your computer and use it in GitHub Desktop.
Save tmcw/f4a55eabe67bb692e751a02fef3ba24b to your computer and use it in GitHub Desktop.

Changeset is received via osm-metronome -> dynamosm

dynamosm-validation takes the changeset and:

  • updates the scoreboards
  • cross-updates the scoreboards

Report

{
  "reason": "User account is less than 10 days old",
  // computer-readable id, used for batch changes like if we want to remove a rule
  "reason_id": "qa/NEW_ACCOUNT",
  "karma": -0.1,
  "source": "user/200350",
  "target": "changeset/4242",
  "report_id": "521421"
}

Comment

{
  "reason": "User account is less than 10 days old",
  "reason_id": "human/COMMENT",
  "karma": 0.1,
  "message": "This is a perfectly valid edit",
  "target": "changeset/4242",
  "report_id": "521422"
}

DynamoDB

Changeset table

PRIMARY   | SECONDARY | SECONDARY   |           |       |
changeset | user      | reason_id   | report_id | karma | message 
---------------------------------------------------------------------------------------
4242      | 200350    | qa/NEW_ACCOUNT | 521421    | -0.1  | 
4242      | 200350    | human/COMMENT  | 521422    |    0  | This is a perfectly valid edit

Changeset rollup table

PRIMARY   | SECONDARY | SECONDARY   |
changeset | user      | karma       |
------------------------------------
4242      | 200350    | -0.1        |

Feature table

PRIMARY   | SECONDARY | SECONDARY | SECONDARY      |           |       |
feature   | user      | changeset | reason_id      | report_id | karma |
------------------------------------------------------------------------
324242    | 200350    | 4242      | qa/SELF_INTERSECT | 521425    | -0.4  |

Reason table

PRIMARY        | SECONDARY      |       |         |
reason_id      | target         | karma | success |
---------------------------------------------------
qa/SELF_INTERSECT | feature/324242 | -0.4  | false   |

For maximum query flexibility, you can create up to 5 global secondary indexes per table.

@rclark
Copy link

rclark commented Apr 12, 2016

This is really disaggregated schema. Tables are managed independently, so that's the primary disadvantage to running multiple tables. In some tables its okay to track multiple types of objects and use some aspect of the compound key to differentiate, but that can also lead to other problems. So my first question is whether or not you can ball these things up into more coherent "documents" with relationships defined in the doc. The upshot there is that UpdateItem requests can reach in and adjust individual attributes within a document, and Query and Scan requests can use arbitrary attribute filters -- you get 5 GSIs that cost less to query, but server-side filtering without an index can still be pretty effective.

It usually comes down to defining what queries you want to satisfy. It might help a lot to write down all the questions you want to answer.

Just spitballing, what happens if you model the data as a single JSON object:

{
  "changesetid": 4242,
  "user": 200350,
  "comments": [
    {"reason": "qa/NEW_ACCOUNT", "karma": -0.1},
    {"reason": "human/COMMENT", "karma": 0, "message": "this is a perfectly valid edit"}
  ],
  "features": [
    {"id": 34242, "user": 200350, "reason": "qa/SELF_INTERSECT", "karma": -0.4}
  ],
  "karma": -0.1
}

If you stick with this disaggregate schema, some of your example tables don't appear to have unique compound keys -- for example what's the unique pair of keys in the changeset table?

@mcwhittemore
Copy link

@rclark - here are some of the queries we talked about today that lead to @tmcw's tables.

  • what is the sum karma for all features with open reports?
  • what is the sum karma for all changesets with open reports?
  • what open reports are there for x feature?
  • what open reports are there for x changeset
  • what open reports are there for x user
  • what is the sum karma of a user? (closed reports only?)
  • what change set does this feature belong too?

@rclark
Copy link

rclark commented Apr 12, 2016

What distinguishes between open & closed reports? (also gist comments suck for having a conversation) ticket somewhere?

@rclark
Copy link

rclark commented Apr 13, 2016

changeset (hash) id (range) user (GSI hash) karma reason message
4242 changeset:4242 200350 -0.1
4242 feature:34242 200350 -0.4 qa/SELF_INTERSECT
4242 comment:1 200350 -0.1 qa/NEW_ACCOUNT
4242 comment:2 200350 0 human/COMMENT this is a perfectly valid edit

... more brainstorming ...

In a table like this you can gather all the information about a single changeset with a query on changeset id. You can get features or comments related to a changeset using a query for a changeset id + id STARTSWITH 'feature'. A GSI with hash user and range id would let you look up everything performed by a particular user.

@tmcw
Copy link
Author

tmcw commented Apr 13, 2016

Matthew's working on a DB schema right now.

Some questions that have been coming up:

How do we choose numbers for vibes?: we might accidentally under or over-value a certain kind of checker and create false positives.

Let's make all reporters worth 1 initially. The weights will show themselves and we'll avoid wasting time bikeshedding them.

What if a positive vibe outweighs a valid bad vibe and we miss an important change?

Let's not include any positive reporters on the first pass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment