Skip to content

Instantly share code, notes, and snippets.

@sandfox
Created April 6, 2013 20:45
Show Gist options
  • Save sandfox/5327563 to your computer and use it in GitHub Desktop.
Save sandfox/5327563 to your computer and use it in GitHub Desktop.
Sky Thoughts

Cool/useful nice to haves things

  1. Session length per object (maybe as some sort of weird meta property usable in a later steps and created from earlier steps?)

Sessions length per object (and maybe also per session per object if max session length is specified) would be über useful as property that could be used in future steps and also just as an output from a query.

Distributed Stuff

More of a question really, How are you thinking that distributed processing is going to work? Is this going to be based on the assumption that all the data for a table exists in 1+x sky instances and every instance has a full copy of the table?

I have been (very vaguely) planning building something on top of sky that handles ensuring tables are duplicated and then if possible parallelising queries across each copy of the table. It's already got reasonably complicated on my post-it notes and i'm trying to not to end up re-inventing riak / dynamo.

Putting replication into sky core seems likes of complexity that could probably sit on top of instead depending on how you think queries are going to distributed.

@benbjohnson
Copy link

@sandfox

Re: Session lengths per object

Are you talking about doing t1 - t0 where t0 is the timestamp of the first event in a session and t1 is the timestamp of the last event in a session? That gets a little tricky since you don't know t1 until you get to the end of the session. I'm going to be implementing a few features (bookmarking, subselections and bidirectional cursors) that'll make this work. Basically, it'll let you select a value from a different event relative to the one that's currently being evaluated. For example, given an event timeline like this:

[A B C D E]
* Events are denoted with letters, session grouped with parentheses.

Currently Sky will run a query over each event like this:

EVAL A
EVAL B
EVAL C
EVAL D
EVAL E

But with the new features, you'll be able to move around. So say you're at EVAL B, you could temporarily step forward until E, grab the value of timestamp and use that value, and then go back to EVAL C as your next step and continue the query execution. Hopefully that makes sense. It also comes in handy if you're at an event and you want to go back to the previous session (or future session) and extract some information (such as previous session length or whether a certain event occurred or the object state).

Re: Distributed processing

Here's my basic notes for distributed processing. Hopefully it makes some sense. :)

The plan is to do a replicated multi-master, two-level virtual sharding scheme. Here's the breakdown of what that all means. :)

Replicated multi-master - Sky is going to use something similar to replica sets that MongoDB has. There's basically N number of groups of nodes. Each group has a master and all the other members of the group are slaves to the master. Only the master can perform writes but reads can be distributed across all nodes. Each group is responsible for a portion of the total dataset. Groups can be added ad hoc.

Two-level virtual sharding - Sky currently uses a sharding scheme to distribute data across the logical CPUs on your computer. I have a dual core with Hyperthreading which means I have 4 logical CPUs so Sky shards out events to 4 separate LevelDB databases internally (and serves each of these from 4 separate servlets). CPU shards are calculated using the modulus of the FNV1a hash of the even bits of the object identifier. FNV1a has a pretty even distribution so it works well.

The even bits are used for the CPU shards because the odd bits are going to be used for the node group shards. Striping the bits is just to keep the distributions of the sharding schemes fairly even and not interfere with one another. I'm going to have a fixed number of node group shards -- probably 256 or 512. A node group can own multiple shards so say you have 2 groups, group A will own shards 0-127, group B will own shards 128-255. If you add a third group then Sky will pull shards from each existing group and give them to the new group. Dropping a group will distribute that group's shards out to the remaining groups.

The structure of the query system is already setup to work with the distributed queries pretty well. There's an aggregate() function that sums and counts everything from each servlet and then there's a merge() function that takes the results from each aggregate() call and smashes them all together. 💥 The structure of the aggregation results and the merge results are identical so I can call out to other nodes to retrieve their aggregations and merge them like they're local aggregations.

The nice thing about Sky is that very little is shared. The only data structures that need consistency are the property files (i.e. schemas) for each table and the group membership. That's the hardest piece. I'm going to leverage the Paxos implementation from Heroku's doozerd project to make these pieces consistent.

It may sound complex but I don't think it'll be too bad (for a distributed computing problem). There are no transactions to worry about or anything crazy like that.

@benbjohnson
Copy link

Also, as far as timeline goes, I'm working on some of the other v0.3.1 issues first before distribution. The histograms (https://github.com/skydb/sky/issues/76) change some of the structure of the query execution so I wanted to get that done before messing with adding distribution.

Most of the other issues are fairly simple. Bookmarking is actually really simple (it's basically a memcpy()) and time-based sampling isn't too bad since I'm just checking the execution time and stopping early if the query goes over.

I'm also supposed to give a talk on Sky at the local Big Data meetup so I need to throw some additional eye candy into Skybox. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment