Skip to content

Instantly share code, notes, and snippets.

@sagivf
Last active March 18, 2017 22:00
Show Gist options
  • Save sagivf/733878e9bdccd1ffaf668b71050d237b to your computer and use it in GitHub Desktop.
Save sagivf/733878e9bdccd1ffaf668b71050d237b to your computer and use it in GitHub Desktop.
RethinkDB count issue and solutions

Problem

Count() is O(n).

This can send a new developer running to the hills, as it seems like a trivial problem, however it is not. While we hope this gets addressed in the future (even in a non ideal way), there are work arounds.

Relevant Issues:

Solutions

  1. Use the tables info command if an estimate is enough -

r.db('DB').table('TABLE').info()('doc_count_estimates').nth(0)

  1. Upgrade your cluster: A sharded cluster with strong servers (SSD, memory, etc) helps a lot. You can also increase --cache-size.

  2. Add a table that saves your counts. You can:

  • increase on every insert
  • use a changefeed, prefarbly with a squash
  • just save the count result now and a again.
  1. add a "position/i/inesrted" field to the table and mantain in memory on inserts. That way the last record sorted by index has the count as it's "position/i/inesrted" propery.

Comment

To the best of my knowledge, if the bulk of your work is with processing tables with millions of rows and analizing them RethinkDB is probably not your best solution. You could also combine it with another DB.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment