The generator.rb
file in this repo will create semi-random activity
data that can be used for experimenting with hyperloglog.
Crunchy Data has a full HLL tutorial at https://crunchydata.com/blog/high-compression-metrics-stograge-with-postgres-hyperloglog
By running this, you'll have a continuously running process that generates two tables: customers and activities. Then, it adds data to those customers and activities data in a way that resembles behavior.
The automatically created customers table looks like this:
Table "public.customers"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+----------------------------------
id | integer | | not null | generated by default as identity
name | text | | not null |
email | text | | not null |
The automatically created activities table looks like this:
Table "public.activities"
Column | Type | Collation | Nullable | Default
-------------+-----------------------------+-----------+----------+---------
received_at | timestamp without time zone | | not null |
customer_id | integer | | not null |
action | text | | not null |
I expect you have a decently functioning Ruby environment. If not, consult a tutorial (like RubyMine's tutorial Set up a Ruby development environment).
- Clone this repo:
git clone git@gist.github.com:8c594d836c7a802bff1f0749e67c8ba4.git
- Install the Postgres & Faker gem:
bundler install
- Run the generator with a database:
DATABASE_URL="postgres://user:pass@host:port/dbname" ruby generator.rb
Once this works, you'll be off and running to test Hyperloglog on Postgres.