Skip to content

Instantly share code, notes, and snippets.

@charl
Created January 10, 2013 09:06
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save charl/4500622 to your computer and use it in GitHub Desktop.
Save charl/4500622 to your computer and use it in GitHub Desktop.
What is the most efficient way to filter documents with a similar structure to the one below based on the 'expanded_url' key's value. There are two more general questions here: * How do you filter documents based on a nested key(s)? * How do you filter documents based on the value of a key that resolves to an array of objects?
{
"created_at": "Tue, 01 Jan 2013 12:55:09 +0000",
"entities": {
"hashtags": [{
"text": "Clojure",
"indices": [39, 47]
}],
"urls": [{
"url": "http://t.co/ythjuG5U",
"expanded_url": "http://benchmarksgame.alioth.debian.org/u32/benchmark.php?test=all&lang=clojure&lang2=java",
"display_url": "benchmarksgame.alioth.debian.org/u32/benchmark.…",
"indices": [118, 138]
}],
"user_mentions": []
},
"from_user": "foo",
"from_user_id": 000000000,
"from_user_id_str": "000000000",
"from_user_name": "Foo Bar",
"geo": null,
"id": 0000000000000000000,
"id_str": "0000000000000000000",
"iso_language_code": "en",
"metadata": {
"result_type": "recent"
},
"profile_image_url": "http://example.com/profile_images/000000000/avatar_normal.png",
"profile_image_url_https": "https://si0.twimg.com/profile_images/000000000/avatar_normal.png",
"source": "<a href="http://twitter.com/">web</a>",
"text": "New \"Programming Clojure\" still claims #Clojure is at \"The speed of hand-written Java code\" when it clearly isn't :-( http://t.co/ythjuG5U",
"to_user": null,
"to_user_id": 0,
"to_user_id_str": "0",
"to_user_name": null
}
@charl
Copy link
Author

charl commented Jan 10, 2013

Here's the error I get when running a naiive filter in the ruby console:

ruby :132 >   r.expr([286108290734252030]).map {|id| r.table(table).get(id)}.filter {|t| t[:entities][:urls].contains('expanded_url')}.run
RuntimeError: RQL: Data: 
[{
        "url":  "https://t.co/QC0jAzFB",
        "expanded_url": "https://groups.google.com/forum/?fromgroups=#!topic/vimclojure/B-UU8qctd5A",
        "display_url":  "groups.google.com/forum/?fromgro…",
        "indices":  [72, 93]
    }]
must be an object
Line: (irb):132:in `evaluate'
Query: [286108290734252030].map("_var_1048", getbykey([:default_db, "tweets"], :id, _var_1048)).filter("_var_1049", _var_1049[:entities][:urls].contains("expanded_url"))
                                                                                                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^

@neumino
Copy link

neumino commented Jan 10, 2013

The only thing I can think of now is using a concatMap and checking the length.
It's definitively not optimal, but we don't have a lot of method to help walking through an array now.
I'll open an issue for that.

r.expr(['id1', 'id2']).map( function(id) { 
     return r.db('db_name').table('table_name').get(id) 
}).filter( function(valid_doc) {
     valid_doc('urls').concatMap(
         function(url) {
             r.branch(r.expr({'www.valid_url.com': true, 'www.other_url.com': true}).contain(url)
            , [true]
            , []
      )}).count().gt(0)
}).run()

@coffeemug
Copy link

Discussion on this is here: rethinkdb/rethinkdb#211

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment