Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Postgres statistics queries
** Find commmonly accessed tables and their use of indexes:
SELECT relname,seq_tup_read,idx_tup_fetch,cast(idx_tup_fetch AS numeric) / (idx_tup_fetch + seq_tup_read) AS idx_tup_pct FROM pg_stat_user_tables WHERE (idx_tup_fetch + seq_tup_read)>0 ORDER BY idx_tup_pct;
Returns output like:
relname | seq_tup_read | idx_tup_fetch | idx_tup_pct
----------------------+--------------+---------------+------------------------
schema_migrations | 817 | 0 | 0.00000000000000000000
user_device_photos | 349 | 0 | 0.00000000000000000000
albums | 530701 | 379 | 0.00071364012954733750
facebook_oauths | 15250 | 36 | 0.00235509616642679576
Analysis: For each row, because "idx_tup_pct" is low than it means that essentially no indexes are being used. In the case of "facebook_oauths"
it turns out we are commonly running a query like "SELECT * FROM facebook_oauths WHERE fb_user_id = X" and it turns out there isnt an index on "fb_user_id"
===============================================================================
** Find the INSERT/UPDATE/DELETE statistics for tables:
SELECT relname,cast(n_tup_ins AS numeric) / (n_tup_ins + n_tup_upd + n_tup_del) AS ins_pct,cast(n_tup_upd AS numeric) / (n_tup_ins + n_tup_upd + n_tup_del) AS upd_pct, cast(n_tup_del AS numeric) / (n_tup_ins
+ n_tup_upd + n_tup_del) AS del_pct
FROM pg_stat_user_tables
WHERE (n_tup_ins + n_tup_upd + n_tup_del) > 0
ORDER BY relname;
================================================================================
** Table I/O
SELECT relname,cast(heap_blks_hit as numeric) / (heap_blks_hit + heap_blks_read) AS hit_pct,
heap_blks_hit,heap_blks_read
FROM pg_statio_user_tables WHERE (heap_blks_hit + heap_blks_read)>0 ORDER BY hit_pct;
'heap_blks_hit' = the number of blocks that were satisfied from the page cache
'heap_blks_read' = the number of blocks that had to hit disk/IO layer for reads
When 'heap_blks_hit' is significantly greater than 'heap_blks_read' than it means we have a well-cached DB and most of the queries can be satisfied from the cache
================================================================================
** Table & Index sizes
SELECT
t.tablename,
indexname,
c.reltuples::integer AS num_rows,
pg_size_pretty(pg_relation_size(quote_ident(t.tablename)::text)) AS table_size,
pg_size_pretty(pg_relation_size(quote_ident(indexrelname)::text)) AS index_size,
CASE WHEN x.is_unique = 1 THEN 'Y'
ELSE 'N'
END AS UNIQUE,
idx_scan AS number_of_scans,
idx_tup_read AS tuples_read,
idx_tup_fetch AS tuples_fetched
FROM pg_tables t
LEFT OUTER JOIN pg_class c ON t.tablename=c.relname
LEFT OUTER JOIN
(SELECT indrelid,
max(CAST(indisunique AS integer)) AS is_unique
FROM pg_index
GROUP BY indrelid) x
ON c.oid = x.indrelid
LEFT OUTER JOIN
( SELECT c.relname AS ctablename, ipg.relname AS indexname, x.indnatts AS number_of_columns, idx_scan, idx_tup_read, idx_tup_fetch,indexrelname FROM pg_index x
JOIN pg_class c ON c.oid = x.indrelid
JOIN pg_class ipg ON ipg.oid = x.indexrelid
JOIN pg_stat_all_indexes psai ON x.indexrelid = psai.indexrelid )
AS foo
ON t.tablename = foo.ctablename
WHERE t.schemaname='public'
ORDER BY pg_relation_size(quote_ident(indexrelname)::text) desc;
================================================================================
** Index Health
SELECT indexrelname,cast(idx_tup_read AS numeric) / idx_scan AS avg_tuples,idx_scan,idx_tup_read FROM pg_stat_user_indexes WHERE idx_scan > 0;
** Index Size
SELECT
schemaname,
relname,
indexrelname,
idx_scan,
pg_size_pretty(pg_relation_size(i.indexrelid)) AS index_size
FROM
pg_stat_user_indexes i
JOIN pg_index USING (indexrelid)
WHERE
indisunique IS false
ORDER BY idx_scan,relname;
** Index I/O - Same idea as Table I/O above
SELECT indexrelname,cast(idx_blks_hit as numeric) / (idx_blks_hit + idx_blks_read) AS hit_pct,
idx_blks_hit,idx_blks_read FROM pg_statio_user_indexes WHERE
(idx_blks_hit + idx_blks_read)>0 ORDER BY hit_pct;
** Show sizes & usage of indexes that are not used very often:
NOTE: we define 'usage' by # of times used, in this case we use '200' - change accordingly
SELECT idstat.relname AS table_name, indexrelname AS index_name, idstat.idx_scan AS times_used,
pg_size_pretty(pg_relation_size(tabstat.relid)) AS table_size, pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
n_tup_upd + n_tup_ins + n_tup_del as num_writes, indexdef AS definition
FROM pg_stat_user_indexes AS idstat JOIN pg_indexes ON indexrelname = indexname
JOIN pg_stat_user_tables AS tabstat ON idstat.relname = tabstat.relname
WHERE idstat.idx_scan < 200 AND indexdef !~* 'unique'
ORDER BY idstat.relname, indexrelname;
@nobodyman

This comment has been minimized.

Copy link

@nobodyman nobodyman commented May 22, 2015

This is a very handy set of queries, thanks for creating it! I do have a question though: I notice that you do not include seq_scans (from pg_stat_user_tables) in any of your queries. My assumption was that (in general) you could compare seq_scans vs. idx_scans values to help identify tables that might be under-utilizing tables.

I'm by no means a postgresql expert. Is this an incorrect assumption on my part? Thanks again --Jim

@ruckus

This comment has been minimized.

Copy link
Owner Author

@ruckus ruckus commented May 31, 2016

Hi @nobodyman - just seeing your comment now - 1 year later.

Good question & point about extracting seq_scans - that would be good to measure and compare. For small tables PG will skip an index in favor of a seq-scan but yes, if you know you have a table that is large enough for an index and its not being used then that would be a good way to identify those queries. Thanks again for your comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment