Chabok job offer exercise
This is an example which uses Node.js as the runtime environment and ClickHouse, which is a column-oriented database management system.
- Create a ClickHouse cluster.
- Setup the connection by setting following environment variables:
DB_PASS(you can study the fallback values here.)
- Install the script dependencies:
$ npm install $ yarn # or use yarn, which is a lot better.
seed.jsfile which creates a table named
Chabox.Eventand populates it with 10 millions records (you can alter the records counts to create by changing
$ node seed
The populating operation took me
1999899.612ms (~33 minutes) on a virtual private server with 1 GB of RAM and uses one core of an
Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz. Data writing rate was
Funnel query example
I've used weighted randoms in my seeder script, so the outcome of my funnel query would be more predictable.
I created a basic funnel using the following query:
--- First query SELECT COUNT(e.ID) As AllEvents, COUNT(multiIf(e.Name = 'logged_in', e.ID, NULL)) AS LoggedIn, COUNT(multiIf(e.Name = 'added_to_cart', e.ID, NULL)) AS AddedToCart, COUNT(multiIf(e.Name = 'visited_checkout', e.ID, NULL)) AS VisitedCheckout, COUNT(multiIf(e.Name = 'purchased', e.ID, NULL)) AS Purchased FROM Chabox.Event As e WHERE e.Date BETWEEN '2019-12-01 00:00:00' AND '2019-12-15 23:59:59';
Which produced the following performance stats:
1 rows in set. Elapsed: 2.387 sec. Processed 10.00 million rows, 209.39 MB (4.19 million rows/s., 87.72 MB/s.)
Executing a simple
OPTIMIZE query would increase the performance slightly.
OPTIMIZE TABLE Chabox.Event;
Running the first query after optimization produces this:
1 rows in set. Elapsed: 2.173 sec. Processed 10.00 million rows, 209.39 MB (4.60 million rows/s., 96.34 MB/s.)
Deleting the WHERE clause doesn't impact the query performance that much:
1 rows in set. Elapsed: 2.250 sec. Processed 10.00 million rows, 418.63 MB (4.44 million rows/s., 186.03 MB/s.)