Skip to content

Instantly share code, notes, and snippets.

@judell
Last active September 28, 2021 15:48
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save judell/888d994e8e2ea3a11372cfa0c15d6159 to your computer and use it in GitHub Desktop.
Save judell/888d994e8e2ea3a11372cfa0c15d6159 to your computer and use it in GitHub Desktop.
accumulating hackernews search results into steampipe

Use case: I want to collect hackernews stories that mention certain keywords. The keywords are defined by a regex, for example steampipe|sql|postgres|\sapi\s'.

The basic steampipe query:

select id, 'hn_new' as source, title, time from hackernews_new where title ~* hn_search_regex()
union
select id, 'hn_show' as source, title, time from hackernews_show_hn where title ~* hn_search_regex()
union
select id, 'hn_ask' as source, title, time from hackernews_ask_hn where title ~*  hn_search_regex()
order by time desc;

To avoid having to rewrite the regex in many places, I've encapsulated it in a Postgres function, hn_search_regex().

create or replace function hn_search_regex() returns text as $$
  begin
    return squote('steampipe|sql|postgres|\sapi\s')
  end;
$$ language plpgsql;

That function has a helper, squote(), to wrap single quotes around the returned value.

Here's some output from the query.

    id    | source  |                                      title                                      |        time
----------+---------+---------------------------------------------------------------------------------+---------------------
 28642326 | hn_new  | A single person answered 76k questions about SQL on StackOverflow               | 2021-09-24 13:52:08
 28636464 | hn_show | Show HN: A Tableau+SQL course for data professionals                            | 2021-09-23 22:37:48
 28630563 | hn_show | Show HN: pganalyze Index Advisor – A better way to index your Postgres database | 2021-09-23 15:25:47

The tables hn_new, hn_ask, and hn_show return recent info. How to accumulate it over time? Here's one way: into an accumulator table.

create table hn(id bigint primary key, source text, title text, ts timestamp with timezone);

Here's its definition.

     Table "public.hn"
 Column |           Type           | 
--------+--------------------------+
 id     | bigint                   |
 source | text                     |
 title  | text                     |
 ts     | timestamp with time zone |
Indexes:
    "hn_pkey" PRIMARY KEY, btree (id)

Here's a procedure to insert rows into that table.

create or replace procedure hn(id bigint, source text, title text, ts timestamp with time zone) as $$
  begin
    insert into hn(id, source, title, ts) values (id, source, title, ts);
  end;
$$ language plplgsl;

Calling that procedure in steampipe.

psql -h localhost -p 9193 -d steampipe -U steampipe

steampipe=> call hn(1, 'hn_new', 'Sample title', current_timestamp);
CALL

steampipe=> call hn(1, 'hn_new', 'Sample title', current_timestamp);
ERROR:  duplicate key value violates unique constraint "hn_pkey"

A file, hn.sql, containing SQL that generating calls to the procedure

with data as (
  select id, 'hn_new' as source, title, time from hackernews_new where title ~* hn_search_regex()
  union
  select id, 'hn_show' as source, title, time from hackernews_show_hn where title ~* hn_search_regex()
  union
  select id, 'hn_ask' as source, title, time from hackernews_ask_hn where title ~*  hn_search_regex()
  order by time desc
)
select
  concat(
    'call hn(',
    d.id,
    ',' || squote(escape_single_quote(d.source)),
    ',' || squote(escape_single_quote(d.title)),
    ',' || squote(d.time::text),
    ');'
  ) as sql
from data d;

Ouput of that query at a point in time.

                                                                 sql
--------------------------------------------------------------------------------------------------------------------------------------
 call hn(28642326,'hn_new','A single person answered 76k questions about SQL on StackOverflow','2021-09-24 13:52:08');
 call hn(28636464,'hn_show','Show HN: A Tableau+SQL course for data professionals','2021-09-23 22:37:48');
 call hn(28630563,'hn_show','Show HN: pganalyze Index Advisor – A better way to index your Postgres database','2021-09-23 15:25:47');

A script, hn.sh, to run that query, outputs rows only (-t) to a file, hn.txt, strips double quotes, and then pipes the procedure calls into psql.

#!/bin/bash

cd /home/jon/playground
psql -h localhost -p 9193 -U steampipe -t --csv < hn.sql > hn.txt
sed -i 's/\"//g' hn.txt
psql -h localhost -p 9193 -U steampipe < hn.txt

A crontab entry to run hn.sh every 30 minutes.

30 * * * * /home/jon/playground/hn.sh
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment