Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Postgres statistics queries
** Find commmonly accessed tables and their use of indexes:
SELECT relname,seq_tup_read,idx_tup_fetch,cast(idx_tup_fetch AS numeric) / (idx_tup_fetch + seq_tup_read) AS idx_tup_pct FROM pg_stat_user_tables WHERE (idx_tup_fetch + seq_tup_read)>0 ORDER BY idx_tup_pct;
Returns output like:
relname | seq_tup_read | idx_tup_fetch | idx_tup_pct
----------------------+--------------+---------------+------------------------
schema_migrations | 817 | 0 | 0.00000000000000000000
user_device_photos | 349 | 0 | 0.00000000000000000000
albums | 530701 | 379 | 0.00071364012954733750
facebook_oauths | 15250 | 36 | 0.00235509616642679576
Analysis: For each row, because "idx_tup_pct" is low than it means that essentially no indexes are being used. In the case of "facebook_oauths"
it turns out we are commonly running a query like "SELECT * FROM facebook_oauths WHERE fb_user_id = X" and it turns out there isnt an index on "fb_user_id"
===============================================================================
** Find the INSERT/UPDATE/DELETE statistics for tables:
SELECT relname,cast(n_tup_ins AS numeric) / (n_tup_ins + n_tup_upd + n_tup_del) AS ins_pct,cast(n_tup_upd AS numeric) / (n_tup_ins + n_tup_upd + n_tup_del) AS upd_pct, cast(n_tup_del AS numeric) / (n_tup_ins
+ n_tup_upd + n_tup_del) AS del_pct
FROM pg_stat_user_tables
WHERE (n_tup_ins + n_tup_upd + n_tup_del) > 0
ORDER BY relname;
================================================================================
** Table I/O
SELECT relname,cast(heap_blks_hit as numeric) / (heap_blks_hit + heap_blks_read) AS hit_pct,
heap_blks_hit,heap_blks_read
FROM pg_statio_user_tables WHERE (heap_blks_hit + heap_blks_read)>0 ORDER BY hit_pct;
'heap_blks_hit' = the number of blocks that were satisfied from the page cache
'heap_blks_read' = the number of blocks that had to hit disk/IO layer for reads
When 'heap_blks_hit' is significantly greater than 'heap_blks_read' than it means we have a well-cached DB and most of the queries can be satisfied from the cache
================================================================================
** Table & Index sizes
SELECT
t.tablename,
indexname,
c.reltuples::integer AS num_rows,
pg_size_pretty(pg_relation_size(quote_ident(t.tablename)::text)) AS table_size,
pg_size_pretty(pg_relation_size(quote_ident(indexrelname)::text)) AS index_size,
CASE WHEN x.is_unique = 1 THEN 'Y'
ELSE 'N'
END AS UNIQUE,
idx_scan AS number_of_scans,
idx_tup_read AS tuples_read,
idx_tup_fetch AS tuples_fetched
FROM pg_tables t
LEFT OUTER JOIN pg_class c ON t.tablename=c.relname
LEFT OUTER JOIN
(SELECT indrelid,
max(CAST(indisunique AS integer)) AS is_unique
FROM pg_index
GROUP BY indrelid) x
ON c.oid = x.indrelid
LEFT OUTER JOIN
( SELECT c.relname AS ctablename, ipg.relname AS indexname, x.indnatts AS number_of_columns, idx_scan, idx_tup_read, idx_tup_fetch,indexrelname FROM pg_index x
JOIN pg_class c ON c.oid = x.indrelid
JOIN pg_class ipg ON ipg.oid = x.indexrelid
JOIN pg_stat_all_indexes psai ON x.indexrelid = psai.indexrelid )
AS foo
ON t.tablename = foo.ctablename
WHERE t.schemaname='public'
ORDER BY pg_relation_size(quote_ident(indexrelname)::text) desc;
================================================================================
** Index Health
SELECT indexrelname,cast(idx_tup_read AS numeric) / idx_scan AS avg_tuples,idx_scan,idx_tup_read FROM pg_stat_user_indexes WHERE idx_scan > 0;
** Index Size
SELECT
schemaname,
relname,
indexrelname,
idx_scan,
pg_size_pretty(pg_relation_size(i.indexrelid)) AS index_size
FROM
pg_stat_user_indexes i
JOIN pg_index USING (indexrelid)
WHERE
indisunique IS false
ORDER BY idx_scan,relname;
** Index I/O - Same idea as Table I/O above
SELECT indexrelname,cast(idx_blks_hit as numeric) / (idx_blks_hit + idx_blks_read) AS hit_pct,
idx_blks_hit,idx_blks_read FROM pg_statio_user_indexes WHERE
(idx_blks_hit + idx_blks_read)>0 ORDER BY hit_pct;
** Show sizes & usage of indexes that are not used very often:
NOTE: we define 'usage' by # of times used, in this case we use '200' - change accordingly
SELECT idstat.relname AS table_name, indexrelname AS index_name, idstat.idx_scan AS times_used,
pg_size_pretty(pg_relation_size(tabstat.relid)) AS table_size, pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
n_tup_upd + n_tup_ins + n_tup_del as num_writes, indexdef AS definition
FROM pg_stat_user_indexes AS idstat JOIN pg_indexes ON indexrelname = indexname
JOIN pg_stat_user_tables AS tabstat ON idstat.relname = tabstat.relname
WHERE idstat.idx_scan < 200 AND indexdef !~* 'unique'
ORDER BY idstat.relname, indexrelname;
@nobodyman

This comment has been minimized.

Copy link

commented May 22, 2015

This is a very handy set of queries, thanks for creating it! I do have a question though: I notice that you do not include seq_scans (from pg_stat_user_tables) in any of your queries. My assumption was that (in general) you could compare seq_scans vs. idx_scans values to help identify tables that might be under-utilizing tables.

I'm by no means a postgresql expert. Is this an incorrect assumption on my part? Thanks again --Jim

@ruckus

This comment has been minimized.

Copy link
Owner Author

commented May 31, 2016

Hi @nobodyman - just seeing your comment now - 1 year later.

Good question & point about extracting seq_scans - that would be good to measure and compare. For small tables PG will skip an index in favor of a seq-scan but yes, if you know you have a table that is large enough for an index and its not being used then that would be a good way to identify those queries. Thanks again for your comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.