Use case: I want to collect hackernews stories that mention certain keywords. The keywords are defined by a regex, for example steampipe|sql|postgres|\sapi\s'
.
The basic steampipe query:
select id, 'hn_new' as source, title, time from hackernews_new where title ~* hn_search_regex()
union
select id, 'hn_show' as source, title, time from hackernews_show_hn where title ~* hn_search_regex()
union
select id, 'hn_ask' as source, title, time from hackernews_ask_hn where title ~* hn_search_regex()
order by time desc;
To avoid having to rewrite the regex in many places, I've encapsulated it in a Postgres function, hn_search_regex()
.
create or replace function hn_search_regex() returns text as $$
begin
return squote('steampipe|sql|postgres|\sapi\s')
end;
$$ language plpgsql;
That function has a helper, squote()
, to wrap single quotes around the returned value.
Here's some output from the query.
id | source | title | time
----------+---------+---------------------------------------------------------------------------------+---------------------
28642326 | hn_new | A single person answered 76k questions about SQL on StackOverflow | 2021-09-24 13:52:08
28636464 | hn_show | Show HN: A Tableau+SQL course for data professionals | 2021-09-23 22:37:48
28630563 | hn_show | Show HN: pganalyze Index Advisor – A better way to index your Postgres database | 2021-09-23 15:25:47
The tables hn_new
, hn_ask
, and hn_show
return recent info. How to accumulate it over time? Here's one way: into an accumulator table.
create table hn(id bigint primary key, source text, title text, ts timestamp with timezone);
Here's its definition.
Table "public.hn"
Column | Type |
--------+--------------------------+
id | bigint |
source | text |
title | text |
ts | timestamp with time zone |
Indexes:
"hn_pkey" PRIMARY KEY, btree (id)
Here's a procedure to insert rows into that table.
create or replace procedure hn(id bigint, source text, title text, ts timestamp with time zone) as $$
begin
insert into hn(id, source, title, ts) values (id, source, title, ts);
end;
$$ language plplgsl;
Calling that procedure in steampipe.
psql -h localhost -p 9193 -d steampipe -U steampipe
steampipe=> call hn(1, 'hn_new', 'Sample title', current_timestamp);
CALL
steampipe=> call hn(1, 'hn_new', 'Sample title', current_timestamp);
ERROR: duplicate key value violates unique constraint "hn_pkey"
A file, hn.sql
, containing SQL that generating calls to the procedure
with data as (
select id, 'hn_new' as source, title, time from hackernews_new where title ~* hn_search_regex()
union
select id, 'hn_show' as source, title, time from hackernews_show_hn where title ~* hn_search_regex()
union
select id, 'hn_ask' as source, title, time from hackernews_ask_hn where title ~* hn_search_regex()
order by time desc
)
select
concat(
'call hn(',
d.id,
',' || squote(escape_single_quote(d.source)),
',' || squote(escape_single_quote(d.title)),
',' || squote(d.time::text),
');'
) as sql
from data d;
Ouput of that query at a point in time.
sql
--------------------------------------------------------------------------------------------------------------------------------------
call hn(28642326,'hn_new','A single person answered 76k questions about SQL on StackOverflow','2021-09-24 13:52:08');
call hn(28636464,'hn_show','Show HN: A Tableau+SQL course for data professionals','2021-09-23 22:37:48');
call hn(28630563,'hn_show','Show HN: pganalyze Index Advisor – A better way to index your Postgres database','2021-09-23 15:25:47');
A script, hn.sh
, to run that query, outputs rows only (-t) to a file, hn.txt
, strips double quotes, and then pipes the procedure calls into psql.
#!/bin/bash
cd /home/jon/playground
psql -h localhost -p 9193 -U steampipe -t --csv < hn.sql > hn.txt
sed -i 's/\"//g' hn.txt
psql -h localhost -p 9193 -U steampipe < hn.txt
A crontab entry to run hn.sh
every 30 minutes.
30 * * * * /home/jon/playground/hn.sh