Skip to content

Instantly share code, notes, and snippets.

@PharkMillups
Created June 24, 2010 11:43
Show Gist options
  • Select an option

  • Save PharkMillups/451346 to your computer and use it in GitHub Desktop.

Select an option

Save PharkMillups/451346 to your computer and use it in GitHub Desktop.
danoyoung # is it possible to have a "nested bucket"? Something like /<bucket>/2009/<key>
benblack # buckets don't nest
danoyoung # ok...that's what I thought thanx.
benblack # np
seancribbs # danoyoung: you can include slashes in the bucket or
key name if you escape them
danoyoung # the reason I ask is that we have satellite data that
we want to model where the <bucket> would be the satellite, and
then all the satellite data organized in a way over <year>/<key>, which
would be something like 2009/001 (year/day)
benblack # you can still do that as seancribbs described, it
just wouldn't be in another bucket
danoyoung # It's more for organizational reasons...we have 600
(and growing) data sets from various satellites and would like
to organize them by satellite and then a date.
I'll look into that, thanx .
seancribbs # so, compose the date and satellite into a bucket
name (or key name)
danoyoung # yea, that's one option.... probably something like
quikscat_2008, quikscat_2009, etc...
seancribbs # right, or even quikscat_201006
danoyoung # We have a measurement for every day of that
particular year, so I was thinking about a key that would represent the day of year....
seancribbs # that would work too
danoyoung # so something like quickscat_2009/001, etc...
seancribbs # the important thing is being able to derive the key, or
have a way to find it easily sounds like your scheme will work well
danoyoung # most of the data would be looked up via dataset, year,
and then a particular day w/n that year....
danoyoung # any other advance queries,, like "show me all of the
Vertical polarization measurements for the quickscat dataset for this
time range X" would be handled via map/reduce...or at least that's
what I'm thinking currently. I plan on front ending Riak with Elastic
Search for these types of queries...until Riak search comes out and I
see what that looks like.
seancribbs # and that would be easy to specify the inputs for
danoyoung # yea, we basically have very different "schemas" based on
datasets, and trying to make a RDBMS (postgres) work for this type of
data has been painful.. RIak seems to be a nice fit.
in addition we have close to a petabyte of data that will need to be
"query'able". right now we can't do it.
seancribbs # yes, that would be hard
danoyoung # it's more the schema constraints...the amount of data
doesn't help either....but with each new satellite NASA launches,
we get an entirely different data set shape....so It's challenging
to support the unknown schema ahead of time.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment