Skip to content

Instantly share code, notes, and snippets.

@rayvoelker
Last active January 10, 2023 19:22
Show Gist options
  • Save rayvoelker/d3ba05c49ed562dcea1547495b3dd31f to your computer and use it in GitHub Desktop.
Save rayvoelker/d3ba05c49ed562dcea1547495b3dd31f to your computer and use it in GitHub Desktop.
some notes on finding "orginal circ"

Finding Original Circulation

finding checkouts without any paired "filled hold" transactions will be how we define an "original circ".

the idea is that we match a "filled hold" with a 'checkout' based on 4 points of data op_code, patron_record_id, item_record_id, transaction_gmt

The problem is that a large portion of these "filled hold" transactions are recorded at a time that's "not exactly" the same time as the checkout transaction.

Because of this, we have to compensate for the difference (some differences can be up to 40 seconds it would appear!)

  1. create a new working copy of the database
  2. alter the table to add a new column, transaction_sec
  3. populate the new column with the UNIX EPOCH (seconds) of the transaction
    UPDATE
    circ_trans
    set transaction_sec = strftime('%s', transaction_gmt)
  4. create indexes
    CREATE INDEX "idx_match_circ" ON "circ_trans" (
    "patron_record_id",
    "item_record_id",
    "transaction_sec",
    "op_code"
    )
    CREATE INDEX "idx_item_location_code" ON "circ_trans" (
      "item_location_code"
    )
  5. use the following example query--the null values in the LEFT OUTER JOIN will be the "orginal circ"
    -- explain query plan
     with checkout_data as (
       SELECT
       id,
       patron_record_id,
       item_record_id,
       transaction_gmt,
       transaction_sec,
       item_location_code,
       op_code
       FROM
       circ_trans
       WHERE
       circ_trans.op_code = 'o'
       and circ_trans.item_location_code in (
         '1pada',
         '1padc',
         '1padf',
         '1padm',
         '1padn',
         '1padt',
         '1pavm',
         '1pabr'
       )
       -- limit 10
     )
     SELECT
     o.id,
     o.patron_record_id,
     o.item_record_id,
     o.transaction_gmt,
     o.item_location_code,
     f.id,
     f.transaction_gmt,
     strftime('%Y', o.transaction_gmt) as op_year,
     abs(o.transaction_sec - f.transaction_sec) as sec_diff
     FROM
     checkout_data as o
     LEFT OUTER JOIN circ_trans as f on (
       f.op_code = 'f'
       and f.patron_record_id = o.patron_record_id 
       and f.item_record_id = o.item_record_id
       and (
         -- seems like there's an inexact time where the filled hold transaction is reported 
         -- add a padding of 60 seconds
         f.transaction_sec BETWEEN (o.transaction_sec - 3600) AND (o.transaction_sec + 3600)
       )
     )
     order by 
     sec_diff DESC

* Below is an example query for looking at a set of location codes, and also using the stat group code numbers 

```sql
-- explain query plan
with checkout_data as (
	SELECT
	id,
	patron_record_id,
	item_record_id,
	transaction_gmt,
	transaction_sec,
	item_location_code,
	op_code,
	stat_group_code_num
	FROM
	circ_trans
	WHERE
	circ_trans.op_code = 'o'
	and circ_trans.item_location_code in (
		'1pada',
		'1padc',
		'1padf',
		'1padm',
		'1padn',
		'1padt',
		'1pavm',
		'1pabr'
	)
	-- limit 10
)
SELECT
-- o.id,
-- o.patron_record_id,
-- o.item_record_id,
-- o.transaction_gmt,
strftime('%Y', o.transaction_gmt) as op_year,
o.item_location_code,
location_view.location_name,
count(*) as count_original_circ
-- count(o.id) as count_original_circ,
-- o.stat_group_code_num
-- f.id,
-- f.transaction_gmt,
-- ,
-- abs(o.transaction_sec - f.transaction_sec) as sec_diff
FROM
checkout_data as o
LEFT OUTER JOIN circ_trans as f on (
	f.op_code = 'f'
	and f.patron_record_id = o.patron_record_id 
	and f.item_record_id = o.item_record_id
	and (
		-- seems like there's an inexact time where the filled hold transaction is reported 
		-- add a padding of 60 seconds
		f.transaction_sec BETWEEN (o.transaction_sec - 3600) AND (o.transaction_sec + 3600)
	)
)
left outer join location_view on location_view.location_code = o.item_location_code
WHERE
cast(o.stat_group_code_num as INTEGER) >= 0 and cast(o.stat_group_code_num as INTEGER) <= 62-- BETWEEN 0 and 62
AND f.id IS NULL
-- AND o.item_location_code = '1padm'
GROUP BY 1,2,3
-- order by 
-- sec_diff DESC

which, generates the following data

op_year	item_location_code	count_original_circ
2019	1pabr	28
2019	1pada	1165
2019	1padc	9176
2019	1padf	4498
2019	1padm	47008
2019	1padn	18572
2019	1padt	23414
2022	1pabr	1488
2022	1pada	675
2022	1padc	342
2022	1padf	1233
2022	1padm	9277
2022	1padn	2830
2022	1padt	331
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment