Created
June 24, 2010 12:18
-
-
Save PharkMillups/451370 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| danoyoung # when passing in keys into a map/reduce job, is there a way to | |
| send in a wildcard pattern, something like 2009_* as the key argument? | |
| seancribbs # danoyoung: no, but you can send a whole bucket | |
| danoyoung # I have a bucket called nsidc_0452 and store keys for | |
| various years of data, i.e. 1998->2009 | |
| seancribbs # and then filter out ones you don't wnat | |
| danoyoung # yea, I know I can send the whole bucket...I was just curious | |
| if I could do a bucket,key (wildcard) comno.... ok, thanx sean | |
| justinsheehy # if the bucket is much bigger than the set you want, you | |
| might (or might not) do better by having a streaming list_keys operation | |
| sent to a process that filters keys, which then streams the smaller set to a MR job. | |
| seancribbs # ^^ justinsheehy | |
| justinsheehy # since that won't require loading the objects from disk before filtering | |
| danoyoung # hmm..interesting, yes...I'll look into that | |
| danoyoung # we're going to have potentially hundreds of thousands of | |
| objects in the bucket...not sure yet if I should break out the data by | |
| year of not yet, something like nsidc_0452_2009, nsidc_0452_2008, etc.... | |
| danoyoung # is the list_keys functionality only in erlang client? | |
| drev1 # keys can be listed from the REST API and proto buffs client | |
| danoyoung # cool, does anyone know if there's a proto buff client for ruby yet? | |
| drev1 # someone has been working on one - http://github.com/aitrus/riak-pbclient | |
| danoyoung # got it, just searched github and saw this...thanx. | |
| seancribbs # danoyoung: yeah, you might also look into generating the | |
| key-list if you have adequate information about what the keys will be named | |
| danoyoung # what we have for this current dataset (and they're all different) | |
| is satellite measurements per day, and each day the satellite generates | |
| eight different measurements for the area of coverage we're interested in...i.e. Greenland | |
| danoyoung # so I'm thinking I need a key that can be unique enough for a | |
| given day...so I was fooling around with something like 2009_001_<uuid>...but | |
| that really doesn't give me enough info on the key to pass it onto a m/r | |
| function...I would need to know the uuid ahead of time. | |
| drev1 # danoyoung: is your data set always going to be 8 readings a day per area? | |
| seancribbs # ooh here's an idea | |
| keep some meta records about your sources of info | |
| they might tell what periods of time are available, etc | |
| then you could use those in an initial map phase to generate the keys | |
| for the actual data | |
| danoyoung # for at least this dataset, yes. | |
| yea, I was keeping data w/n the json struct with the year, day, etc.... | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment