Skip to content

Instantly share code, notes, and snippets.

@dmagda
Last active May 24, 2023 05:29
Show Gist options
  • Save dmagda/53dcc107722fc79c7050c5aa99f17792 to your computer and use it in GitHub Desktop.
Save dmagda/53dcc107722fc79c7050c5aa99f17792 to your computer and use it in GitHub Desktop.
Mock data generation with built-in database capabilities

Mock data generation with built-in database capabilities

Relational databases such as PostgreSQL can do much more than store and return your application records. Those databases usually come with comprehensive compute capabitilies.

In this short note, let's review the generate_series function of PostgreSQL, then can generate mock data of various complexity.

  1. Create a sample table for time-series data:

    create table sensor (
      id int,
      value int,
      event_time timestamp);
  2. Create a helper function that generates random integer numbers withing a range:

    CREATE OR REPLACE FUNCTION random_between(low INT ,high INT) 
    RETURNS INT AS
    $$
    BEGIN
       RETURN floor(random()* (high-low + 1) + low);
    END;
    $$ language 'plpgsql' STRICT;
  3. Generate mock data with the built-in generate_series function:

    insert into sensor 
    select random_between (1,500), /* generating a random sensor ID */
         random_between (0,200), /* generating random measurement value reported by the sensor */
         generate_series('2022-10-24'::timestamp,'2023-10-24'::timestamp, '1 minute') /* generating the time of the measurement event  */;

    Note, the generate_series will create a record for every minute starting on 2022-10-24 and ending on 2023-10-24.

It will take 10-30 seconds to generate the data set.

  1. Make sure you have more than a half a million records in the table:
    select count(*) from sensor;
     count  
    --------
     525601
    (1 row)
  2. With the records looking as follows:
    select * from sensor limit 5;
    id  | value |     event_time      
    -----+-------+---------------------
     386 |   199 | 2022-10-24 00:00:00
     180 |    44 | 2022-10-24 00:01:00
      30 |    68 | 2022-10-24 00:02:00
     220 |    40 | 2022-10-24 00:03:00
     342 |   153 | 2022-10-24 00:04:00
    (5 rows)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment