Skip to content

Instantly share code, notes, and snippets.

@ippa
Last active December 17, 2015 11:49
Show Gist options
  • Save ippa/5605152 to your computer and use it in GitHub Desktop.
Save ippa/5605152 to your computer and use it in GitHub Desktop.
rethinkdb braindump

Comming from raw SQL and then activerecord (ruby) this was my first impressions/thoughts on RethinkDB. It's just a quick braindump so take it for what it is :).

  • why not have RethinkDB::RQL#inspect call run() so I can play around easier in irb. Constructing queries and see the results directly. How useful is it really to have the query I just constructed printed again?

  • I was running 1.4.5 and "apt-get update" got me 1.5 and suddenly my database wasn't working. The package was released before the blogpost with migration-instructions. Edgecase, but I managed to get bitten by it :P. The log-entry could contain the url with migrations-instructions?

  • Why not just order() instead of order_by(), you don't have filter_by() or replace_with(). Maybe I'm colored by active record in this case.. but less is usually more. Hard to confuse a simpler order() with anything else.

  • r.connect should take a hash/json argument. Then I could do r.connect(db: "heroes"), host would default to localhost and port to 28015 as normal. Then I could connect to my database without having to remember the port since databaseargument is last.

  • The whole query.run(connection) structure looked odd to me in the beginning. I was wondering why it wasn't (to me the more natural, in ruby):


  connection = RethinkDB::Connection.new
  connection.db("foo").table("bar").filter(...).run

Feels like a more natural flow, you start with the connection to a database.. on that you pick a table.. and then you do your query. I'm sure there's a good reason the connection is the argument to run() but as 100% new to rethinkdb and a rubyist it looks a bit odd.

  • You already have the technology with the data-explorer, why not add a "RUN"-button to all examples at http://rethinkdb.com/api/ .. it could query against some testdb and show output. It would be a killerfeature when it comes to understanding the api. Relavant: https://groups.google.com/forum/?fromgroups#!topic/rethinkdb/8XraaGno4QE

  • what's the difference between get_all('foo', index: "name") vs filter({name: "foo"}) ... get_all can take an explicit index-argument but filter can't? Couldn't all cases you can solve with get_all be solved with filter if it would take that index-argument?

@coffeemug
Copy link

Hi. Your feedback is great -- thank you very much. (We unfortunately have far more requests than time, but rest assured that we'll do all we can to process it to make the product better) I'll ping @mlucy (the Ruby driver maintainer) to chime in on Ruby-specific issues, but I wanted to tell you about some specifics I personally know about.

I was running 1.4.5 and "apt-get update" got me 1.5 and suddenly my database wasn't working.

Since the product is young, we've traditionally cared much more about fixing bugs quickly than about making the migration process seamless at the expense of slower development on our part. We essentially decided that fixing issues quickly is a better service to our users. Now that a lot more people are using Rethink, this is no longer true, so we'll transition to a nicer migration process soon.

r.connect should take a hash/json argument

This is how the Javascript driver works, and IMO it's awesome. I'm not a Ruby user, but I think I'd prefer this in Ruby too.

The whole query.run(connection) structure looked odd to me in the beginning.

The thought process behind this is that any given query is just an abstract syntax tree. You can construct it independently of a connection, and can reuse it. For example:

my_users = r.table('my_table').filter({ 'users': 'some_condition' })
my_users.order_by().run(conn)
my_users.pluck().run(conn)
# You can also chain indefinitely
my_users.filter(...).pluck(...).eq_join(...).order_by(...)

This can get very sophisticated, and is really really nice once you get used to it. So we decided to let people create queries independently of the connection to make all of this functionality nice.

You already have the technology with the data-explorer, why not add a "RUN"-button to all examples

It's a great idea. We were going to, but it's a bit challenging and we're fixing some lower-hanging fruit first. I'd definitely love to get to this soon.

what's the difference between get_all('foo', index: "name") vs filter({name: "foo"})

filter does a linear scan through the table, applies a condition to each row, and uses the row if the condition passes. In your example above, the two queries return exactly the same result, but get_all is faster because it uses an index. There is nothing preventing us from implementing an optimizer in filter that will automatically pick an index and convert it to get_all under the hood, but we decided that introducing a new command gives the following benefits:

  1. We can ship quickly and give users secondary indexes that solve 90% of their problems without a massive engineering cost of developing a full optimizer.
  2. Optimizers often make mistakes which causes disastrous results in OLTP applications. So an explicit way to control indexes is actually really nice.

Hope this makes sense!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment