Relational databases such as PostgreSQL can do much more than store and return your application records. Those databases usually come with comprehensive compute capabitilies.
In this short note, let's review the generate_series function of PostgreSQL, then can generate mock data of various complexity.
-
Create a sample table for time-series data:
create table sensor ( id int, value int, event_time timestamp);
-
Create a helper function that generates random integer numbers withing a range:
CREATE OR REPLACE FUNCTION random_between(low INT ,high INT) RETURNS INT AS $$ BEGIN RETURN floor(random()* (high-low + 1) + low); END; $$ language 'plpgsql' STRICT;
-
Generate mock data with the built-in
generate_series
function:insert into sensor select random_between (1,500), /* generating a random sensor ID */ random_between (0,200), /* generating random measurement value reported by the sensor */ generate_series('2022-10-24'::timestamp,'2023-10-24'::timestamp, '1 minute') /* generating the time of the measurement event */;
Note, the
generate_series
will create a record for every minute starting on2022-10-24
and ending on2023-10-24
.
It will take 10-30 seconds to generate the data set.
- Make sure you have more than a half a million records in the table:
select count(*) from sensor; count -------- 525601 (1 row)
- With the records looking as follows:
select * from sensor limit 5; id | value | event_time -----+-------+--------------------- 386 | 199 | 2022-10-24 00:00:00 180 | 44 | 2022-10-24 00:01:00 30 | 68 | 2022-10-24 00:02:00 220 | 40 | 2022-10-24 00:03:00 342 | 153 | 2022-10-24 00:04:00 (5 rows)