Skip to content

Instantly share code, notes, and snippets.

@jtarchie
Last active June 18, 2020 18:39
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jtarchie/e3b3ec39b54f311c0deb1219824ea78a to your computer and use it in GitHub Desktop.
Save jtarchie/e3b3ec39b54f311c0deb1219824ea78a to your computer and use it in GitHub Desktop.

A table defined as (this is simplified):

CREATE TABLE build_events (
   build_id INTEGER, -- associated with the builds table (joins not important here)
   event_id INTEGER, -- increases for every entry that is inserted, used in a subquent ORDER BY
   payload TEXT.     -- JSON payload
);

Within the payload column is actually a JSON payload. At the moment, I don't have the ability to convert it to a json or jsonb type.

The JSON payload, contains three fields time (unix epoch), origin (uuid), and (yet again) payload (string). An example entry,

{
  "time": 1122233344,
  "origin": "a1b2c3",
  "payload": "some entry from a log line"
}

No calculation is done against these entries, but a SELECT * FROM build_events WHERE build_id = ? ORDER event_id is done at some point. The event_id is used to order the entries (not time), probably as simple optimization.

In the build_events table, there can be multiple entries for the same time, I'd like to group those (via string concatenation) into a single entry by the time value in the payload.

For example, given a table with the following entries:

build_id=1,event_id=1,payload=`{"time":"1234", payload:"log line 1\n"}`
build_id=1,event_id=2,payload=`{"time":"1234", payload:"log line 2\n"}`
build_id=1,event_id=3,payload=`{"time":"1235", payload:"log line 3\n"}`
build_id=1,event_id=4,payload=`{"time":"1236", payload:"log line 4\n"}`

I'd like to squash them together to look like:

build_id=1,event_id=1,payload=`{"time":"1234", payload:"log line 1\nlog line 2\n"}`
build_id=1,event_id=3,payload=`{"time":"1235", payload:"log line 3\n"}`
build_id=1,event_id=4,payload=`{"time":"1236", payload:"log line 4\n"}`

Notice the payload on the first entry includes the concatonated string of payload.

# frozen_string_literal: true
require 'pg'
require 'json'
build_id = '90809'
conn = PG.connect(ENV['DATABASE_URL'])
conn.exec_params(<<~SQL, [build_id]) do |results|
SELECT
build_id,
event_id,
payload
FROM build_events
WHERE
build_id = $1 AND type = 'log'
ORDER BY event_id;
SQL
results = results.to_a.each do |entry|
entry['payload'] = JSON.parse(entry['payload'])
end.group_by do |entry|
[entry.dig('payload', 'time'), entry['payload']['origin']]
end
updated_event_ids = results.map do |_, records|
event_id = records.first['event_id']
payload = records.first['payload']
payload['payload'] = records.map { |r| r.dig('payload', 'payload') }.join
conn.exec_params(%(
UPDATE build_events
SET payload = $1
WHERE build_id = $2 AND event_id = $3;
), [JSON.generate(payload), build_id, event_id])
event_id
end
delete_event_ids = results.values.flatten.map { |r| r['event_id'] } - updated_event_ids
unless delete_event_ids.empty?
conn.exec("DELETE FROM build_events WHERE build_id = #{build_id} AND event_id IN (#{delete_event_ids.join(',')})")
end
end
@jchesterpivotal
Copy link

jchesterpivotal commented Jun 18, 2020

If you can live without event_id, this appears to work:

create schema jt_tmp;
create table jt_tmp.build_events (
  build_id integer,
  event_id integer,
  payload text
);

insert into jt_tmp.build_events(build_id, event_id, payload) values
(1,1,'{"time":"1234", "payload":"log line 1\n"}'),
(1,2,'{"time":"1234", "payload":"log line 2\n"}'),
(1,3,'{"time":"1235", "payload":"log line 3\n"}'),
(1,4,'{"time":"1236", "payload":"log line 4\n"}');

select
    build_id
  , payload::json->>'time'                     as json_time
  , string_agg(payload::json->>'payload', '')  as json_payload
from jt_tmp.build_events
group by build_id, json_time;

@jchesterpivotal
Copy link

This query appears to work on the sample data, but it has a bit of I-don't-really-understand-it magic involved. Plus the select distinct is ugly and will usually add a big ol' sort operation.

with extracted_from_json as (
  select
     build_id
   , event_id
   , payload::json->>'time'     as json_time
   , payload::json->>'payload'  as json_payload
  from jt_tmp.build_events
)
select distinct
    build_id
  , first_value(event_id) over (partition by build_id, json_time order by build_id, event_id) as event_id
  , json_time
  , string_agg(json_payload, '') over (partition by build_id, json_time) as json_payload
from extracted_from_json
group by build_id, event_id, json_time, json_payload
order by build_id, event_id, json_time
;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment