finding checkouts without any paired "filled hold" transactions will be how we define an "original circ".
the idea is that we match a "filled hold" with a 'checkout' based on 4 points of data op_code, patron_record_id, item_record_id, transaction_gmt
The problem is that a large portion of these "filled hold" transactions are recorded at a time that's "not exactly" the same time as the checkout transaction.
Because of this, we have to compensate for the difference (some differences can be up to 40 seconds it would appear!)
- create a new working copy of the database
- alter the table to add a new column,
transaction_sec
- populate the new column with the UNIX EPOCH (seconds) of the transaction
UPDATE circ_trans set transaction_sec = strftime('%s', transaction_gmt)
- create indexes
CREATE INDEX "idx_match_circ" ON "circ_trans" ( "patron_record_id", "item_record_id", "transaction_sec", "op_code" )
CREATE INDEX "idx_item_location_code" ON "circ_trans" ( "item_location_code" )
- use the following example query--the null values in the LEFT OUTER JOIN will be the "orginal circ"
-- explain query plan with checkout_data as ( SELECT id, patron_record_id, item_record_id, transaction_gmt, transaction_sec, item_location_code, op_code FROM circ_trans WHERE circ_trans.op_code = 'o' and circ_trans.item_location_code in ( '1pada', '1padc', '1padf', '1padm', '1padn', '1padt', '1pavm', '1pabr' ) -- limit 10 ) SELECT o.id, o.patron_record_id, o.item_record_id, o.transaction_gmt, o.item_location_code, f.id, f.transaction_gmt, strftime('%Y', o.transaction_gmt) as op_year, abs(o.transaction_sec - f.transaction_sec) as sec_diff FROM checkout_data as o LEFT OUTER JOIN circ_trans as f on ( f.op_code = 'f' and f.patron_record_id = o.patron_record_id and f.item_record_id = o.item_record_id and ( -- seems like there's an inexact time where the filled hold transaction is reported -- add a padding of 60 seconds f.transaction_sec BETWEEN (o.transaction_sec - 3600) AND (o.transaction_sec + 3600) ) ) order by sec_diff DESC
* Below is an example query for looking at a set of location codes, and also using the stat group code numbers
```sql
-- explain query plan
with checkout_data as (
SELECT
id,
patron_record_id,
item_record_id,
transaction_gmt,
transaction_sec,
item_location_code,
op_code,
stat_group_code_num
FROM
circ_trans
WHERE
circ_trans.op_code = 'o'
and circ_trans.item_location_code in (
'1pada',
'1padc',
'1padf',
'1padm',
'1padn',
'1padt',
'1pavm',
'1pabr'
)
-- limit 10
)
SELECT
-- o.id,
-- o.patron_record_id,
-- o.item_record_id,
-- o.transaction_gmt,
strftime('%Y', o.transaction_gmt) as op_year,
o.item_location_code,
location_view.location_name,
count(*) as count_original_circ
-- count(o.id) as count_original_circ,
-- o.stat_group_code_num
-- f.id,
-- f.transaction_gmt,
-- ,
-- abs(o.transaction_sec - f.transaction_sec) as sec_diff
FROM
checkout_data as o
LEFT OUTER JOIN circ_trans as f on (
f.op_code = 'f'
and f.patron_record_id = o.patron_record_id
and f.item_record_id = o.item_record_id
and (
-- seems like there's an inexact time where the filled hold transaction is reported
-- add a padding of 60 seconds
f.transaction_sec BETWEEN (o.transaction_sec - 3600) AND (o.transaction_sec + 3600)
)
)
left outer join location_view on location_view.location_code = o.item_location_code
WHERE
cast(o.stat_group_code_num as INTEGER) >= 0 and cast(o.stat_group_code_num as INTEGER) <= 62-- BETWEEN 0 and 62
AND f.id IS NULL
-- AND o.item_location_code = '1padm'
GROUP BY 1,2,3
-- order by
-- sec_diff DESC
which, generates the following data
op_year item_location_code count_original_circ
2019 1pabr 28
2019 1pada 1165
2019 1padc 9176
2019 1padf 4498
2019 1padm 47008
2019 1padn 18572
2019 1padt 23414
2022 1pabr 1488
2022 1pada 675
2022 1padc 342
2022 1padf 1233
2022 1padm 9277
2022 1padn 2830
2022 1padt 331