Skip to content

Instantly share code, notes, and snippets.

@lo48576
Last active August 24, 2022 15:26
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save lo48576/3fc718124de4e22b97d22568031f12dc to your computer and use it in GitHub Desktop.
Save lo48576/3fc718124de4e22b97d22568031f12dc to your computer and use it in GitHub Desktop.
Mastodon 腐った DB 矯正メモ (2020-05-05) by らりお

Mastodon 腐った DB 矯正メモ (2020-05-05)

状況

  • UNIQUE 制約が破綻していたことは確認済。
  • この状態で、問題を解決しないまま pg_dump で SQL を吐き出し、 postgres:9.6-alpine から postgres:12.2-alpine へのアプデをした。
    • つまり psql -U postgres postgres みたいなやつで SQL を食わせて DB のデータを突っ込んだ。
    • 既にアホの所業
  • UNIQUE 制約を守れない index については、当然作成が失敗した。
    • コンソールのログを残さないで捨てたのでどれが失敗したか確認してない (究極のアホ)。
    • まあそもそも元データの時点で破綻しているので、復元が完全であることは期待していない。
  • いくつかの index が存在しないことで Mastodon が常用に堪えないレベルで重くなったため、いいかげん不整合の解消を決意。

破綻した UNIQUE 制約

UNIQUE 制約が守られていないことで DB が破綻していたのは確認済。 (cf. https://mastodon.cardina1.red/@lo48576/103974978595397759, https://mastodon.cardina1.red/@lo48576/103974987462202016)

schema.rb を確認すると、どうやらこの UNIQUE 制約がテーブルのカラムに直接与えられるのではなく INDEX 経由で指定されているらしいことを知る。 (ということは、重複インデックスとか迂闊に削除したら必要なはずの UNIQUE 制約が消えるということでは……)

存在しない index は SELECT tablename, indexname FROM pg_indexes;https://github.com/tootsuite/mastodon/blob/master/db/schema.rbt.index を比較することで探す。 (Ruby とか Rails に慣れてる人はもっと良い方法知ってるかもしれないけど)

やっていき

やることは単純で、

  • とりあえず (UNIQUE 制約のせいで失敗することを確認するために) index の作成を試みる
  • 一意であるべきカラムが重複している行をいくつか消して重複排除
  • 今度こそ index を作成

これでいける。

以下は私の場合なので、状況によって適当にテーブル名やカラム名をうまく選ぶこと。

index_conversations_on_uri

postgres=# CREATE UNIQUE INDEX index_conversations_on_uri ON conversations (uri);
ERROR:  could not create unique index "index_conversations_on_uri"
DETAIL:  Key (uri)=(tag:sukebeneko.com,2020-05-04:objectId=15173825:objectType=Conversation) is duplicated.
postgres=# SELECT uri, MIN(id), COUNT(id) FROM conversations GROUP BY uri HAVING COUNT(id) > 1;
                                         uri                                          |   min   |  count
--------------------------------------------------------------------------------------+---------+---------
(中略)
 tag:vocalodon.net,2020-03-08:objectId=28473409:objectType=Conversation               | 3951699 |       2
 tag:vocalodon.net,2020-03-17:objectId=28732170:objectType=Conversation               | 3987968 |       2
                                                                                      |       1 | 1072471
(214 rows)
postgres=# SELECT COUNT(*) FROM conversations WHERE uri IS NULL;
  count
---------
 1072479
(1 row)

postgres=# SELECT uri, MIN(id), COUNT(id) FROM conversations WHERE uri IS NOT NULL GROUP BY uri HAVING COUNT(id) > 1;
(中略)
 tag:vocalodon.net,2020-03-17:objectId=28732170:objectType=Conversation               | 3987968 |     2
(214 rows)

postgres=# SELECT uri, id FROM conversations WHERE uri IS NOT NULL AND id NOT IN (SELECT MIN(id) FROM conversations WHERE uri IS NOT NULL GROUP BY uri);
(↑遅すぎて話にならない。中断。)
^CCancel request sent
ERROR:  canceling statement due to user request
postgres=# SELECT MAX(id), uri FROM conversations WHERE uri IS NOT NULL GROUP BY uri HAVING COUNT(id) > 1;
(中略)
 3987970 | tag:vocalodon.net,2020-03-17:objectId=28732170:objectType=Conversation
(215 rows)

postgres=# DELETE FROM conversations WHERE id IN (SELECT MAX(id) FROM conversations WHERE uri IS NOT NULL GROUP BY uri HAVING COUNT(id) > 1);
DELETE 215
postgres=# DELETE FROM conversations WHERE id IN (SELECT MAX(id) FROM conversations WHERE uri IS NOT NULL GROUP BY uri HAVING COUNT(id) > 1);
DELETE 1
postgres=# DELETE FROM conversations WHERE id IN (SELECT MAX(id) FROM conversations WHERE uri IS NOT NULL GROUP BY uri HAVING COUNT(id) > 1);
DELETE 0
postgres=# CREATE UNIQUE INDEX index_conversations_on_uri ON conversations (uri);
CREATE INDEX

本当は重複排除で DELETE FROM conversations WHERE uri IS NOT NULL AND id NOT IN (SELECT MIN(id) FROM conversations WHERE uri IS NOT NULL GROUP BY uri); のようにしたかったが、これがあまりに重くてどうしようもなかったため、手動で何回も DELETE を回す方針に切り替えた。 (ちなみにこのクエリは「同じ uri を持つ行の中で最も id が小さい行をそれぞれ抽出し、その行と uriNULL である行を除いて全て消す」という感じ。)

待ち時間を考えるのが面倒だったので、以下もっと軽いテーブルについても全て同様の方針でやっていった。

index_custom_emojis_on_shortcode_and_domain

postgres=# CREATE UNIQUE INDEX index_custom_emojis_on_shortcode_and_domain on custom_emojis (shortcode, domain);
ERROR:  could not create unique index "index_custom_emojis_on_shortcode_and_domain"
DETAIL:  Key (shortcode, domain)=(_____kohu, mstdn.plusminus.io) is duplicated.
postgres=# SELECT MIN(shortcode), MIN(domain), COUNT(*) FROM custom_emojis GROUP BY (shortcode, domain) HAVING COUNT(*) > 1;
(中略)
 SuperFastSpin                                                                                                                          | yysk.icu                    |     2
 _na                                                                                                                                    | mstdn.plusminus.io          |     2
 dsno_n                                                                                                                                 | cutls.com                   |     2
(792 rows)

postgres=# DELETE FROM custom_emojis WHERE id IN (SELECT MAX(id) FROM custom_emojis GROUP BY (shortcode, domain) HAVING COUNT(*) > 1);
DELETE 792
postgres=# DELETE FROM custom_emojis WHERE id IN (SELECT MAX(id) FROM custom_emojis GROUP BY (shortcode, domain) HAVING COUNT(*) > 1);
DELETE 160
postgres=# DELETE FROM custom_emojis WHERE id IN (SELECT MAX(id) FROM custom_emojis GROUP BY (shortcode, domain) HAVING COUNT(*) > 1);
DELETE 55
postgres=# DELETE FROM custom_emojis WHERE id IN (SELECT MAX(id) FROM custom_emojis GROUP BY (shortcode, domain) HAVING COUNT(*) > 1);
DELETE 18
postgres=# DELETE FROM custom_emojis WHERE id IN (SELECT MAX(id) FROM custom_emojis GROUP BY (shortcode, domain) HAVING COUNT(*) > 1);
DELETE 4
postgres=# DELETE FROM custom_emojis WHERE id IN (SELECT MAX(id) FROM custom_emojis GROUP BY (shortcode, domain) HAVING COUNT(*) > 1);
DELETE 2
postgres=# DELETE FROM custom_emojis WHERE id IN (SELECT MAX(id) FROM custom_emojis GROUP BY (shortcode, domain) HAVING COUNT(*) > 1);
DELETE 0
postgres=# CREATE UNIQUE INDEX index_custom_emojis_on_shortcode_and_domain on custom_emojis (shortcode, domain);
CREATE INDEX
postgres=#

index_preview_cards_on_url

postgres=# CREATE UNIQUE INDEX index_preview_cards_on_url ON preview_cards (url);
ERROR:  could not create unique index "index_preview_cards_on_url"
DETAIL:  Key (url)=(https://komittee-express.netlify.com/posts/my-recommending-mastodon-server-20191114/) is duplicated.
postgres=# SELECT url, COUNT(*) FROM preview_cards GROUP BY url HAVING COUNT(*) > 1;
(中略)
(1152 rows)

postgres=# DELETE FROM preview_cards WHERE id IN (SELECT MAX(id) FROM preview_cards GROUP BY url HAVING COUNT(*) > 1);
DELETE 1152
postgres=# DELETE FROM preview_cards WHERE id IN (SELECT MAX(id) FROM preview_cards GROUP BY url HAVING COUNT(*) > 1);
DELETE 51
postgres=# DELETE FROM preview_cards WHERE id IN (SELECT MAX(id) FROM preview_cards GROUP BY url HAVING COUNT(*) > 1);
DELETE 4
postgres=# DELETE FROM preview_cards WHERE id IN (SELECT MAX(id) FROM preview_cards GROUP BY url HAVING COUNT(*) > 1);
DELETE 0
postgres=# CREATE UNIQUE INDEX index_preview_cards_on_url ON preview_cards (url);
CREATE INDEX
postgres=#

index_statuses_on_uri

postgres=# CREATE UNIQUE INDEX index_statuses_on_uri ON statuses (uri);
ERROR:  could not create unique index "index_statuses_on_uri"
DETAIL:  Key (uri)=(https://baraag.net/users/nakaishow/statuses/103805517317631658) is duplicated.
postgres=# SELECT uri, COUNT(*) FROM statuses GROUP BY uri HAVING COUNT(*) > 1;
(中略)
 https://yysk.icu/users/kozue/statuses/104105116094270839                                 |     2
 https://zenyasai.g-fukurowl.club/users/g_fukurowl_zenyasai/statuses/104113755277985520   |     2
(1046 rows)

postgres=# DELETE FROM statuses WHERE id IN (SELECT MAX(id) FROM statuses GROUP BY uri HAVING COUNT(*) > 1);
DELETE 1046
postgres=# DELETE FROM statuses WHERE id IN (SELECT MAX(id) FROM statuses GROUP BY uri HAVING COUNT(*) > 1);
DELETE 43
postgres=# DELETE FROM statuses WHERE id IN (SELECT MAX(id) FROM statuses GROUP BY uri HAVING COUNT(*) > 1);
DELETE 3
postgres=# DELETE FROM statuses WHERE id IN (SELECT MAX(id) FROM statuses GROUP BY uri HAVING COUNT(*) > 1);
DELETE 1
postgres=# DELETE FROM statuses WHERE id IN (SELECT MAX(id) FROM statuses GROUP BY uri HAVING COUNT(*) > 1);
DELETE 0
postgres=# CREATE UNIQUE INDEX index_statuses_on_uri ON statuses (uri);
CREATE INDEX
postgres=#

index_tags_on_name_lower

postgres=# CREATE UNIQUE INDEX index_tags_on_name_lower ON tags (lower((name)::text));
ERROR:  could not create unique index "index_tags_on_name_lower"
DETAIL:  Key (lower(name::text))=(庵点とは読めないよね) is duplicated.
postgres=# SELECT name, COUNT(*) FROM tags GROUP BY name HAVING COUNT(*) > 1;
(中略)
 無料                                                                 |    11
 末代thatskygame部                                                    |     3
(2345 rows)

postgres=# DELETE FROM tags WHERE id IN (SELECT MAX(id) FROM tags GROUP BY name HAVING COUNT(*) > 1);
DELETE 2345
postgres=# DELETE FROM tags WHERE id IN (SELECT MAX(id) FROM tags GROUP BY name HAVING COUNT(*) > 1);
DELETE 653
postgres=# DELETE FROM tags WHERE id IN (SELECT MAX(id) FROM tags GROUP BY name HAVING COUNT(*) > 1);
DELETE 275
postgres=# DELETE FROM tags WHERE id IN (SELECT MAX(id) FROM tags GROUP BY name HAVING COUNT(*) > 1);
DELETE 120
postgres=# DELETE FROM tags WHERE id IN (SELECT MAX(id) FROM tags GROUP BY name HAVING COUNT(*) > 1);
DELETE 68
postgres=# DELETE FROM tags WHERE id IN (SELECT MAX(id) FROM tags GROUP BY name HAVING COUNT(*) > 1);
DELETE 39
postgres=# DELETE FROM tags WHERE id IN (SELECT MAX(id) FROM tags GROUP BY name HAVING COUNT(*) > 1);
DELETE 27
postgres=# DELETE FROM tags WHERE id IN (SELECT MAX(id) FROM tags GROUP BY name HAVING COUNT(*) > 1);
DELETE 16
postgres=# DELETE FROM tags WHERE id IN (SELECT MAX(id) FROM tags GROUP BY name HAVING COUNT(*) > 1);
DELETE 11
postgres=# DELETE FROM tags WHERE id IN (SELECT MAX(id) FROM tags GROUP BY name HAVING COUNT(*) > 1);
DELETE 8
postgres=# DELETE FROM tags WHERE id IN (SELECT MAX(id) FROM tags GROUP BY name HAVING COUNT(*) > 1);
DELETE 4
postgres=# DELETE FROM tags WHERE id IN (SELECT MAX(id) FROM tags GROUP BY name HAVING COUNT(*) > 1);
DELETE 2
postgres=# DELETE FROM tags WHERE id IN (SELECT MAX(id) FROM tags GROUP BY name HAVING COUNT(*) > 1);
DELETE 2
postgres=# DELETE FROM tags WHERE id IN (SELECT MAX(id) FROM tags GROUP BY name HAVING COUNT(*) > 1);
DELETE 2
postgres=# DELETE FROM tags WHERE id IN (SELECT MAX(id) FROM tags GROUP BY name HAVING COUNT(*) > 1);
DELETE 2
postgres=# DELETE FROM tags WHERE id IN (SELECT MAX(id) FROM tags GROUP BY name HAVING COUNT(*) > 1);
DELETE 2
postgres=# DELETE FROM tags WHERE id IN (SELECT MAX(id) FROM tags GROUP BY name HAVING COUNT(*) > 1);
DELETE 0
postgres=# CREATE UNIQUE INDEX index_tags_on_name_lower ON tags (lower((name)::text));
ERROR:  could not create unique index "index_tags_on_name_lower"
DETAIL:  Key (lower(name::text))=(meiko) is duplicated.
postgres=# SELECT lower((name)::text), COUNT(*) FROM tags GROUP BY lower((name)::text) HAVING COUNT(*) > 1;
     lower      | count
----------------+-------
 customcast     |     2
 naroun7491fi   |     2
 x2f            |     2
 meiko          |     2
 linux          |     2
 mastoart       |     2
 latech         |     2
 visiblewomen   |     2
 prismo         |     2
 instanceticker |     2
 stopcovid19jp  |     2
 プリコネr      |     2
 tinla          |     2
 activitypub    |     2
 kyash          |     2
 covid19        |     2
 pixelfed       |     2
 mastodontips   |     2
 vrchat         |     2
(19 rows)

postgres=# DELETE FROM tags WHERE id IN (SELECT MAX(id) FROM tags GROUP BY lower((name)::text) HAVING COUNT(*) > 1);
DELETE 19
postgres=# DELETE FROM tags WHERE id IN (SELECT MAX(id) FROM tags GROUP BY lower((name)::text) HAVING COUNT(*) > 1);
DELETE 0
postgres=# CREATE UNIQUE INDEX index_tags_on_name_lower ON tags (lower((name)::text));
CREATE INDEX
postgres=#

所感

正直ありえないと思う。 どんな狂った設計したらこんな不整合が発生するの。 Mastodon の品質への不信がムクムクと育っています。

頭が悪いので、バックアップとってないサーバでサービス稼働状態のまま直接 SQL 叩いてる

--- https://mastodon.cardina1.red/@lo48576/104114210440486200

生きてる本番サーバで叩く DELETE は最高に楽しい、今この瞬間を生きているという強烈な実感を持てるよ (適当)

--- https://mastodon.cardina1.red/@lo48576/104114214981273758

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment