Skip to content

Instantly share code, notes, and snippets.

@MichaelLeeHobbs
Last active October 25, 2023 18:10
Show Gist options
  • Star 8 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save MichaelLeeHobbs/67980d165fc68880eb2ab283c673244b to your computer and use it in GitHub Desktop.
Save MichaelLeeHobbs/67980d165fc68880eb2ab283c673244b to your computer and use it in GitHub Desktop.
Mirth Postgres Table Size By Channel
-- 20201125T1403 EST - Fix bug in xpath not correctly matching values to correct channel
-- 20201125T1430 EST - Updated formatting only using: http://poorsql.com/ with minor hand edits
-- 20201125T1432 EST - Removed CID from final output as it was only there for debugging xpath issues
-- 20210321T1140 EST - Postgres 13+ XPATH works differently see: https://gist.github.com/MichaelLeeHobbs/40b4b7cf70ecbe30b73eed763367e626
--
-- Jon Bartels
-- Similar query for SQL Server is here - https://stackoverflow.com/questions/58942371/get-space-used-by-mirth-per-channel-in-sql-server
--
SELECT NAME AS CHANNEL_NAME
, TOTAL_BYTES
, CHANNEL_ID
, PRUNE_ENABLED
, PRUNE_DAYS
, ARCHIVE_ENABLED
, TABLE_SCHEMA
, TABLE_NAME
, MIRTH_ID
, MIRTH_TYPE
, ROW_ESTIMATE
, INDEX_BYTES
, TOAST_BYTES
, TABLE_BYTES
FROM (
WITH RECURSIVE PG_INHERIT(INHRELID, INHPARENT) AS (
SELECT INHRELID, INHPARENT
FROM PG_INHERITS
UNION
SELECT CHILD.INHRELID, PARENT.INHPARENT
FROM PG_INHERIT CHILD, PG_INHERITS PARENT
WHERE CHILD.INHPARENT = PARENT.INHRELID
)
, PG_INHERIT_SHORT AS (
SELECT *
FROM PG_INHERIT
WHERE INHPARENT NOT IN (SELECT INHRELID FROM PG_INHERIT)
)
SELECT TABLE_SCHEMA
, TABLE_NAME
, NULLIF(REGEXP_REPLACE(TABLE_NAME, '\D', '', 'g'), '')::NUMERIC AS MIRTH_ID
, SUBSTRING(TABLE_NAME FROM 'd_(\D+)\d+') AS MIRTH_TYPE
, ROW_ESTIMATE
, PG_SIZE_PRETTY(TOTAL_BYTES) AS TOTAL_BYTES
, PG_SIZE_PRETTY(INDEX_BYTES) AS INDEX_BYTES
, PG_SIZE_PRETTY(TOAST_BYTES) AS TOAST_BYTES
, PG_SIZE_PRETTY(TABLE_BYTES) AS TABLE_BYTES
FROM (
SELECT *, TOTAL_BYTES - INDEX_BYTES - COALESCE(TOAST_BYTES, 0) AS TABLE_BYTES
FROM (
SELECT C.OID
, NSPNAME AS TABLE_SCHEMA
, RELNAME AS TABLE_NAME
, SUM(C.RELTUPLES) OVER (PARTITION BY PARENT) AS ROW_ESTIMATE
, SUM(PG_TOTAL_RELATION_SIZE(C.OID)) OVER (PARTITION BY PARENT) AS TOTAL_BYTES
, SUM(PG_INDEXES_SIZE(C.OID)) OVER (PARTITION BY PARENT) AS INDEX_BYTES
, SUM(PG_TOTAL_RELATION_SIZE(RELTOASTRELID)) OVER (PARTITION BY PARENT) AS TOAST_BYTES
, PARENT
FROM (
SELECT PG_CLASS.OID
, RELTUPLES
, RELNAME
, RELNAMESPACE
, PG_CLASS.RELTOASTRELID
, COALESCE(INHPARENT, PG_CLASS.OID) PARENT
FROM PG_CLASS
LEFT JOIN PG_INHERIT_SHORT ON INHRELID = OID
WHERE RELKIND IN ('r', 'p')
) C
LEFT JOIN PG_NAMESPACE N ON N.OID = C.RELNAMESPACE
ORDER BY TOTAL_BYTES DESC
) A
WHERE OID = PARENT
) A
) TABLE_SIZES
LEFT JOIN D_CHANNELS ON D_CHANNELS.LOCAL_CHANNEL_ID = TABLE_SIZES.MIRTH_ID
LEFT JOIN CHANNEL ON CHANNEL.ID = D_CHANNELS.CHANNEL_ID
LEFT JOIN (
SELECT (XPATH('string/text()', ENTRY)) [1]::TEXT AS CID
, (XPATH('com.mirth.connect.model.ChannelMetadata/enabled/text()', ENTRY)) [1]::TEXT::boolean AS PRUNE_ENABLED
, (XPATH('com.mirth.connect.model.ChannelMetadata/pruningSettings/pruneMetaDataDays/text()', ENTRY)) [1]::TEXT::INT AS PRUNE_DAYS
, (XPATH('com.mirth.connect.model.ChannelMetadata/pruningSettings/archiveEnabled/text()', ENTRY)) [1]::TEXT::boolean AS ARCHIVE_ENABLED
FROM (
SELECT UNNEST(XPATH('/map/entry', VALUE::XML)) AS ENTRY
FROM CONFIGURATION
WHERE CATEGORY = 'core' AND NAME = 'channelMetadata'
) X
) AS M ON M.CID = D_CHANNELS.CHANNEL_ID;
@jonbartels
Copy link

jonbartels commented Nov 24, 2020

Do you really need pg_inherit in MC? AFAICT that handles table inheritance, most often used with autopartitioning in recent PG releases. To the best of my knowledge MC doesn't do that. Frankly if you have MC tables that are so large as to need partitioning you're using MC for long term storage and thats not its ideal use case.

The overall query plan is a little chunky, if you definitely do not need the inheritance counts removing them would eliminate the need for a CTE and some of the subqueries and the partitioning.

@jonbartels
Copy link

Reordering your SELECT clause may be useful, most readers will want to know "which channels big?" so do this:
SELECT name as channel_name , total_bytes , channel_id , table_schema , TABLE_NAME , mirth_id , cast (row_estimate as bigint) as row_estimate , index_bytes , toast_bytes , table_bytes

@jonbartels
Copy link

You can also add a table type like this:

, NULLIF(regexp_replace(TABLE_NAME, '\D','','g'), '')::numeric as mirth_id , substring(TABLE_NAME FROM 'd_(\D+)\d+') as mirth_type

that gives mm, mc, ma, mcm etc as a new column. This will help users know if say they're storing too much crap in maps versus dealing with big messages. A case statement might be used to translate the ma, mc, mcm to their plain-language names

@MichaelLeeHobbs
Copy link
Author

I've reordered the columns and added mirth_type. Going to leave the pg_inherit for now incase someone is storing data other than mirth data in the DB that happens to use autopartitioning. Very rare if anyone does but just in case.

@jonbartels
Copy link

jonbartels commented May 10, 2022

Posting an idea here - I'll play with it as I have time.

A slack user was facing high memory usage issues in MC. One common cause of this is having large objects in map data. This causes high memory usage because ALL of those maps are in memory.

This is relevant to the size query here because the d_mc table size could be computed by content_type. This would allow an engineer analysing space usage to determine if:

  • they have large messages (USE ATTACHMENTS)
  • they have excessive data in maps (store less stuff in maps or use attachments)
  • there are large error objects (rare, but possible)

The enum for content_type is:

    RAW(1), PROCESSED_RAW(2), TRANSFORMED(3), ENCODED(4), SENT(5), RESPONSE(
            6), RESPONSE_TRANSFORMED(7), PROCESSED_RESPONSE(8), CONNECTOR_MAP(9), CHANNEL_MAP(
                    10), RESPONSE_MAP(11), PROCESSING_ERROR(
                            12), POSTPROCESSOR_ERROR(13), RESPONSE_ERROR(14), SOURCE_MAP(15)

I think this would be best implemented as a stored procedure that takes a local_channel_id and a date range as an argument and returns a JSON object with the size breakdown by content_type. I worry this query would be inefficient since it has to actually look at table data and not just PG internal data to calculate size.

I implemented this:

EXPLAIN SELECT content_type, pg_size_pretty(sum(length(content)))
FROM d_mc3
WHERE content_type IN (9,10,11,15)
GROUP BY content_type;

The content type is part of a composite primary key, so it's indexed.

The query plan still has to do a parallel sequence scan though. I don't think it can be avoided

QUERY PLAN
Finalize GroupAggregate  (cost=66892.54..66895.61 rows=12 width=36)
  Group Key: content_type
  ->  Gather Merge  (cost=66892.54..66895.34 rows=24 width=12)
        Workers Planned: 2
        ->  Sort  (cost=65892.52..65892.55 rows=12 width=12)
              Sort Key: content_type
              ->  Partial HashAggregate  (cost=65892.18..65892.30 rows=12 width=12)
                    Group Key: content_type
                    ->  Parallel Seq Scan on d_mc3  (cost=0.00..65387.76 rows=67256 width=785)
"                          Filter: (content_type = ANY ('{9,10,11,15}'::integer[]))"

For my stated use case, perhaps computing the average map content length would be more useful to identify which channels are storing more data in maps on average than in total:

EXPLAIN SELECT content_type, pg_size_pretty(avg(length(content))), pg_size_pretty(min(length(content)::BIGINT)), pg_size_pretty(max(length(content))::BIGINT)
FROM d_mc3
WHERE content_type IN (9,10,11,15)
GROUP BY content_type;

@jonbartels
Copy link

Adding order by pg_size_bytes(total_bytes) DESC makes this query show the largest tables first.

@jonbartels
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment