Skip to content

Instantly share code, notes, and snippets.

@supix
Last active April 22, 2024 20:37
Show Gist options
  • Star 52 You must be signed in to star a gist
  • Fork 13 You must be signed in to fork a gist
  • Save supix/80f9a6111dc954cf38ee99b9dedf187a to your computer and use it in GitHub Desktop.
Save supix/80f9a6111dc954cf38ee99b9dedf187a to your computer and use it in GitHub Desktop.
Postgres error: Missing chunk 0 for toast value in pg_toast

The problem

In some cases, it is possible that PostgreSQL tables get corrupted. This can happen in case of hardware failures (e.g. hard disk drives with write-back cache enabled, RAID controllers with faulty/worn out battery backup, etc.), as clearly reported in this wiki page. Furthermore, it can happen in case of incorrect setup, as well.

One of the symptoms of such corruptions is the following message:

ERROR: missing chunk number 0 for toast value 123456 in pg_toast_45678

This almost surely indicates that a corrupted chunk is present within a table file. But there is a good way to get rid of it.

The solution

Let's suppose that the corrupted table is called mytable. Many articles on the Internet suggest to fire the following query against the database:

psql> select reltoastrelid::regclass from pg_class where relname = 'mytable';

 reltoastrelid      
-------------------------
 pg_toast.pg_toast_40948
(1 row)

and then to fire the following commands:

REINDEX table mytable;
REINDEX table pg_toast.pg_toast_40948;
VACUUM analyze mytable;

But in my case this was not enough. Then, I computed the number of rows in mytable:

psql> select count(*) from mytable;

 count
-------
 58223

To find the corruption, it is possible to fetch data from the table until getting the 'Missing chunk...' error. So the following group of queries does the job:

select * from mytable order by id limit 5000 offset 0;
select * from mytable order by id limit 5000 offset 5000;
select * from mytable order by id limit 5000 offset 10000;
select * from mytable order by id limit 5000 offset 15000;
select * from mytable order by id limit 5000 offset 20000;
...

...and so on until getting the error. In this example, if you reach the offset of 55000 (55000 + 5000 is 60000 which exceeds the total number of records) without getting the error, then your table is not corrupted. The order by clause is necessary to make your query repeatable, i.e. assure that the query does not randomly return rows, and limit and offset clauses work as expected. If your table does not have an id field, you have to find a good field to order by. For performance reasons, it is preferable to select an indexed field.

In order to go faster and not get your console dirty, the query can be directly triggered from the console, redirecting the output to /dev/null and printing an error message only in case of error found:

psql -U pgsql -d mydatabase -c "select * from mytable order by id limit 5000 offset 0" > /dev/null || echo "Corrupted chunk read!"

The above syntax means: execute the query and redirect the output to /dev/null or, in case of error (||), write an error message.

Let's suppose that the first query giving the error is the following:

select * from mytable order by id limit 5000 offset 10000;
Corrupted chunk read!
>

Now, you know that the corrupted chunk is in the rows between 10000 and 14999. So, you can narrow the search by halving the query LIMIT clause.

select * from mytable order by id limit 2500 offset 10000;
Corrupted chunk read!
>

So, the error happens to be in the rows between 10000 and 12499. We halve again the rows limit.

select * from mytable order by id limit 1250 offset 10000;
>

Fetching the rows between 10000 and 12499 does not return any error. So the error must be in the rows between 11250 and 12499. We can confirm this by firing the query:

select * from mytable order by id limit 1250 offset 11250;
Corrupted chunk read!
>

So, we halve again the limit.

select * from mytable order by id limit 625 offset 11250;
>
select * from mytable order by id limit 625 offset 11875;
Corrupted chunk read!
>
...

You should continue narrowing until exactly finding the corrupted row:

...
select * from mytable order by id limit 1 offset 11963;
Corrupted chunk read!
>

Note that in this last query the LIMIT 1 clause exactly identifies only one row.

Finally, you have to find the id of the corrupted row and delete it (obviously you have a data loss):

psql> select id from mytable order by id limit 1 offset 11963;
   id
--------
 121212

psql> delete from mytable where id = 121212;
DELETE 1
>

During the search of the corrupted row, consider that, most likely, the corruption is in the last inserted/updated records, even if this is not a general rule. So you can choose a sort key that respects the physical insert/update so to lower the scan time.

If you prefer to fully automate the corrupted row search, consider using the following script (in csh syntax):

#!/bin/csh
set j = 0
while ($j < 58223) //here the total number of table rows
  psql -U pgsql -d mydatabase -c "SELECT * FROM mytable LIMIT 1 offset $j" >/dev/null || echo $j
  @ j++
end

This script prints the number of all the corrupted rows. In case of long tables, it can take long time since it performs as many queries as the number of table rows.

xyhtac wrote a tool that implements this algorithm with a binary search enhancement (way faster). You can find it here.

Credits

Credits to this post and to this tweet.

@juniorccs
Copy link

Thank you very much for this

@vernes
Copy link

vernes commented Feb 17, 2020

Thanks

@Exilized
Copy link

Oh, you beautiful human being! Thank you so very, very much!

@darklow
Copy link

darklow commented Feb 13, 2021

Thanks for sharing. Here's the same in sh script:

#!/bin/sh
j=0
while [ $j -lt 58223 ]
do
  psql -U postgres -d mydb -c "SELECT * FROM mytable LIMIT 1 offset $j" >/dev/null || echo $j
  j=$(($j+1))
done

@dcalde
Copy link

dcalde commented Feb 27, 2021

It may not be necessary to delete the entire row. If you know which column(s) is stored in the particular TOAST table you can fix the corruption by simply setting the column(s) to NULL for the affected record.

UPDATE mytable SET toasted_col = NULL WHERE id=121212

@happydiass
Copy link

You save my day, Thank you

@Karis1221
Copy link

thank you, but I still same error after performed above steps. any suggestions?

@supix
Copy link
Author

supix commented Dec 4, 2021

@Karis1221 more than one corrupted row?

@Karis1221
Copy link

@supix thank you for your reply. Yes. there're 3 rows corrupted. They were deleted but still got error.

@kxccc
Copy link

kxccc commented Dec 14, 2021

Thanks

@mariaczi
Copy link

@dcalde Could you explain how to find which colum is stored in problematic TOAST?

@bukem
Copy link

bukem commented Jan 3, 2022

@mariaczi Postgresql does not refer to TOAST if it is not needed, so after identifying the corrupted row using select * from mytable ... you can identify the column that is corrupted by going through them one by one, i.e.:

SELECT toasted2_col FROM mytable WHERE id=121212; -- let's say this query executed without an error
SELECT toasted_col FROM mytable WHERE id=121212; -- and this one failed

@Bader72
Copy link

Bader72 commented Jan 12, 2022

Hello,

1- What should i do when i haven't "id" column in my table?
The sql order have ERROR!
MyDatabas=# select * from tablename order by id limit 5000 offset 0;
ERROR: column "id" does not exist
LINE 1: select * from tablenameorder by id limit 5000 offset 0;
There's really no id column but i have 2 unique indexes!!

2- the "for" script has done this results:
ERROR: missing chunk number 0 for toast value 21221813 in pg_toast_16624
182119
ERROR: missing chunk number 0 for toast value 21221812 in pg_toast_16624
182120
how can i explore it please? (remind i haven't id column in my tale).
Thank you for help.

@supix
Copy link
Author

supix commented Jan 12, 2022

@Bader72 1) In case your table does not have an id column, you can safely use a unique index, since it allows to:

  • create repeatable queries that always return the same result set;
  • unambiguously identify the corrupted row.
  1. You should execute the following query:
SELECT * FROM mytable LIMIT 1 offset 182119

It should give you the error meaning that the identified row is corrupted. So you have to find the value of your unique index for that row:

SELECT myUniquelyIndexedFieldName FROM mytable LIMIT 1 offset 182119

Let's suppose that the query gives you the result myValue. Then the corrupted row can be deleted by executing the following DML statement:

DELETE FROM mytable WHERE myUniquelyIndexedFieldName = 'myValue';

The same applies to row having offset 182120.

@Bader72
Copy link

Bader72 commented Jan 13, 2022

Hello supix,

Thank you for response.
Sorry we don't make the things samely because i have 2 unique indexes!! snifff!

MyDatabas=# \d tablename
...
Indexes:
    "nmsdelivery_id2" UNIQUE, btree (ideliveryid)
    "nmsdelivery_internalname2" UNIQUE, btree (sinternalname)
...

Now about the famous "myUniquelyIndexedFieldName", i had done this:

MyDatabas=# SELECT i.relname AS index_name,
    coalesce(
       a.attname,
       pg_get_expr(ix.indexprs, ix.indrelid)
    ) AS indexed_expression
FROM pg_class c
    INNER JOIN pg_index ix ON c.oid = ix.indrelid
    INNER JOIN pg_class i ON ix.indexrelid = i.oid
    LEFT JOIN pg_attribute a ON a.attrelid = c.oid AND a.attnum = ANY (ix.indkey)
WHERE c.oid = 'public.tablename '::regclass
    AND ix.indisunique
ORDER BY array_position(ix.indkey, a.attnum) ASC;
        index_name         | indexed_expression
---------------------------+--------------------
 nmsdelivery_internalname2 | sinternalname
 nmsdelivery_id2           | ideliveryid
(2 rows)
pgfodif1=# SELECT ideliveryid FROM tablename LIMIT 1 offset 182119;
 ideliveryid
-------------
    16369924
(1 row)
MyDatabas=# SELECT ideliveryid FROM tablename  LIMIT 1 offset 182120;
 ideliveryid
-------------
    16369926
MyDatabas=# DELETE FROM tablename  WHERE ideliveryid = '16369924';
DELETE 1
MyDatabas=# DELETE FROM tablename  WHERE ideliveryid = '16369926';
DELETE 1
MyDatabas=# vacuum (full, verbose) tablename ;
INFO:  vacuuming "public.tablename "
ERROR:  missing chunk number 0 for toast value 21221813 in pg_toast_16624

No change! certainly i hadn't put the right myUniquelyIndexedFieldName?! isn't it?
how i can display it please?
Best Regards

@supix
Copy link
Author

supix commented Jan 13, 2022

@Bader72 Correct choosing either index when you have two.

Try executing again the following script.

#!/bin/csh
set j = 0
while ($j < _hereTheTableLength_)
  psql -U pgsql -d mydatabase -c "SELECT * FROM mytable order by ideliveryid LIMIT 1 OFFSET $j" >/dev/null || echo $j
  @ j++
end

Then, based on the output, you delete more corrupted rows. For example, if you have this output:

ERROR: missing chunk number 0 for toast value 21221812 in pg_toast_16624
12345

you can execute

SELECT ideliveryid FROM mytable order by ideliveryid LIMIT 1 offset 12345

Say 54321 the returned value, you execute:

DELETE FROM mytable WHERE ideliveryid = 54321

@mitchtchesnitch
Copy link

Thank you very much, this approach saved us a lot of time!

@xyhtac
Copy link

xyhtac commented May 15, 2022

Thank you so much for this insight.

I had to deal with large tables (1.2 terabytes) and thousands of corrupted fields, so i made a tool for quick binary search and recovery.

@supix
Copy link
Author

supix commented May 17, 2022

@xyhtac, thanks a lot. Linked!

@alptureci
Copy link

Hi,
Thank you for the detailed solution. I am trying to follow this solution however, when i executed the below line
REINDEX table pg_toast.pg_toast_pg_toast_2985244; I got this error :
ERROR: permission denied for schema pg_toast

I am not sure what could cause this. The user has administrative access on the database that I am trying make the fix. I granted all the Table and function rights. What am I missing? Can you please help?

@mizhka
Copy link

mizhka commented Jul 11, 2022

If anyone faces this issue, please check visibility maps as well:

SELECT c.relname, v.* 
  FROM pg_class c, 
       lateral pg_visibility(c.oid::regclass) v 
 WHERE c.relkind in ('r','t') 
   and v.all_frozen 
   and not v.pd_all_visible;

Feel free to contact me to help with such kind of issue (to fix and to find root cause).

@DauletT
Copy link

DauletT commented Nov 8, 2022

Thank you very much!

@JuniorCaldeira
Copy link

Thank you very much, you helped me a lot

@lingyanmeng
Copy link

Thanks!

@azhinu
Copy link

azhinu commented Jun 8, 2023

Really helpful.

@benzimmer
Copy link

Hey, thanks for this!
In the intro you write

Furthermore, it can happen in case of incorrect setup, as well.

Do you have examples for "incorrect setup"s?

@supix
Copy link
Author

supix commented Sep 4, 2023

@benzimmer,

Do you have examples for "incorrect setup"s?

for example giving not enough resources (e.g. memory or disk space) to run the database.

@aamir-mansoori
Copy link

Since we can get the culprit record by running above shell, after that we can run below blob to get the culprit column by passing the id and then update the culprit column with NULL.

DO
$$
DECLARE
col_list record;
column_name varchar; -- The dynamic column name
column_value varchar; -- Variable to store the retrieved column data
BEGIN
FOR col_list IN
SELECT t.table_name, t.column_name
FROM information_schema."columns" t
WHERE t.table_schema = 'your-schema-name'
AND t.table_name ='your-table-name'
ORDER BY ordinal_position
LOOP
BEGIN
-- Set the dynamic column name
column_name := col_list.column_name;

        -- Construct and execute a dynamic SQL query
        EXECUTE 'SELECT ' || column_name || ' FROM your-table-name WHERE id = 1234' 
        INTO column_value;

        -- Print the retrieved column value
        RAISE NOTICE 'Column % is ok for data %', column_name, column_value;
    EXCEPTION
        WHEN others THEN
            -- Handle exceptions here
            column_value := '-1';
            RAISE EXCEPTION 'Corrupted data occurred for column %: %', column_name, SQLERRM;
    END;
END LOOP;

END;
$$
LANGUAGE plpgsql;

@praisin
Copy link

praisin commented Mar 1, 2024

Thanks for sharing.

@ckazi
Copy link

ckazi commented Apr 22, 2024

I wrote a utility in Golang that solves this problem out of the box. Here's the link: https://github.com/ckazi/chunky

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment