Skip to content

Instantly share code, notes, and snippets.

@par6n
Created Dec 13, 2019
Embed
What would you like to do?

Chabok job offer exercise

This is an example which uses Node.js as the runtime environment and ClickHouse, which is a column-oriented database management system.

Installation

  1. Create a ClickHouse cluster.
  2. Setup the connection by setting following environment variables: DB_HOST, DB_USER, DB_PASS (you can study the fallback values here.)
  3. Install the script dependencies:
$ npm install
$ yarn # or use yarn, which is a lot better.
  1. Run seed.js file which creates a table named Chabox.Event and populates it with 10 millions records (you can alter the records counts to create by changing ROWS_COUNT constant.)
$ node seed

Performance

The populating operation took me 1999899.612ms (~33 minutes) on a virtual private server with 1 GB of RAM and uses one core of an Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz. Data writing rate was ~5002 rows/seconds.

Funnel query example

I've used weighted randoms in my seeder script, so the outcome of my funnel query would be more predictable.

I created a basic funnel using the following query:

--- First query
SELECT
  COUNT(e.ID) As AllEvents,
  COUNT(multiIf(e.Name = 'logged_in', e.ID, NULL)) AS LoggedIn,
  COUNT(multiIf(e.Name = 'added_to_cart', e.ID, NULL)) AS AddedToCart,
  COUNT(multiIf(e.Name = 'visited_checkout', e.ID, NULL)) AS VisitedCheckout,
  COUNT(multiIf(e.Name = 'purchased', e.ID, NULL)) AS Purchased
FROM Chabox.Event As e
WHERE e.Date BETWEEN '2019-12-01 00:00:00' AND '2019-12-15 23:59:59';

Which produced the following performance stats:

1 rows in set. Elapsed: 2.387 sec. Processed 10.00 million rows, 209.39 MB (4.19 million rows/s., 87.72 MB/s.) 

Executing a simple OPTIMIZE query would increase the performance slightly.

OPTIMIZE TABLE Chabox.Event;

Running the first query after optimization produces this:

1 rows in set. Elapsed: 2.173 sec. Processed 10.00 million rows, 209.39 MB (4.60 million rows/s., 96.34 MB/s.)

Deleting the WHERE clause doesn't impact the query performance that much:

1 rows in set. Elapsed: 2.250 sec. Processed 10.00 million rows, 418.63 MB (4.44 million rows/s., 186.03 MB/s.) 
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment