This is an example which uses Node.js as the runtime environment and ClickHouse, which is a column-oriented database management system.
- Create a ClickHouse cluster.
- Setup the connection by setting following environment variables:
DB_HOST
,DB_USER
,DB_PASS
(you can study the fallback values here.) - Install the script dependencies:
$ npm install
$ yarn # or use yarn, which is a lot better.
- Run
seed.js
file which creates a table namedChabox.Event
and populates it with 10 millions records (you can alter the records counts to create by changingROWS_COUNT
constant.)
$ node seed
The populating operation took me 1999899.612ms
(~33 minutes) on a virtual private server with 1 GB of RAM and uses one core of an Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz
. Data writing rate was ~5002 rows/seconds
.
I've used weighted randoms in my seeder script, so the outcome of my funnel query would be more predictable.
I created a basic funnel using the following query:
--- First query
SELECT
COUNT(e.ID) As AllEvents,
COUNT(multiIf(e.Name = 'logged_in', e.ID, NULL)) AS LoggedIn,
COUNT(multiIf(e.Name = 'added_to_cart', e.ID, NULL)) AS AddedToCart,
COUNT(multiIf(e.Name = 'visited_checkout', e.ID, NULL)) AS VisitedCheckout,
COUNT(multiIf(e.Name = 'purchased', e.ID, NULL)) AS Purchased
FROM Chabox.Event As e
WHERE e.Date BETWEEN '2019-12-01 00:00:00' AND '2019-12-15 23:59:59';
Which produced the following performance stats:
1 rows in set. Elapsed: 2.387 sec. Processed 10.00 million rows, 209.39 MB (4.19 million rows/s., 87.72 MB/s.)
Executing a simple OPTIMIZE
query would increase the performance slightly.
OPTIMIZE TABLE Chabox.Event;
Running the first query after optimization produces this:
1 rows in set. Elapsed: 2.173 sec. Processed 10.00 million rows, 209.39 MB (4.60 million rows/s., 96.34 MB/s.)
Deleting the WHERE clause doesn't impact the query performance that much:
1 rows in set. Elapsed: 2.250 sec. Processed 10.00 million rows, 418.63 MB (4.44 million rows/s., 186.03 MB/s.)