tateisu/gist:58d8e566f67a2cd3aa999c7c4b5e6ee1

## gistfile1.md

      
    Raw
  

              gistfile1.md
            
          
    マストドンのLTLを見るクエリの話です。
https://mastodon.zunda.ninja/@zundan/98830605465042155 に例が出てたのでクエリ最適化を検討してみます。
statusテーブルの最近のデータがリモート多め未収載多めだったりしてLTLに出せるトゥートが少ないと遅くなるらしいです。
(うちはそんなでもないけど)
元のクエリ

SELECT "statuses"."id", "statuses"."updated_at" FROM "statuses" 
LEFT OUTER JOIN "accounts" ON "accounts"."id" = "statuses"."account_id" 
WHERE ("statuses"."local" = TRUE OR "statuses"."uri" IS NULL) 
AND "statuses"."visibility" = 0 
AND (statuses.reblog_of_id IS NULL) 
AND (statuses.reply = FALSE OR statuses.in_reply_to_account_id = statuses.account_id) 
AND "accounts"."silenced" = FALSE 
ORDER BY "statuses"."id" DESC LIMIT 40

Limit  (cost=0.72..333.35 rows=40 width=16) (actual time=0.216..7.819 rows=40 loops=1)
  ->  Nested Loop  (cost=0.72..3025919.06 rows=363879 width=16) (actual time=0.214..7.778 rows=40 loops=1)
        ->  Index Scan Backward using statuses_pkey on statuses  (cost=0.43..1645289.57 rows=363948 width=24) (actual time=0.100..6.989 rows=40 loops=1)
              Filter: ((local OR (uri IS NULL)) AND (reblog_of_id IS NULL) AND ((NOT reply) OR (in_reply_to_account_id = account_id)) AND (visibility = 0))
              Rows Removed by Filter: 1995
        ->  Index Scan using accounts_pkey on accounts  (cost=0.29..3.78 rows=1 width=8) (actual time=0.015..0.016 rows=1 loops=40)
              Index Cond: (id = statuses.account_id)
              Filter: (NOT silenced)
Planning time: 0.698 ms
Execution time: 7.922 ms

joinをin(select...) に置き換える


accountsテーブルのデータはクエリ結果には使われないので、サブクエリにすると結合するコストが下がったり、条件が良ければIndex Only Scanになってデータアクセスを減らせたりします
rails的には mastodon/mastodon#5373 と同じ方法で出来るのかな。
でも今回の例だとsilencedカラムを見てるから、結局アカウントデータの実体までアクセスが必要になっちゃう

SELECT "statuses"."id", "statuses"."updated_at" FROM "statuses" 
WHERE ("statuses"."local" = TRUE OR "statuses"."uri" IS NULL) 
AND "statuses"."visibility" = 0 
AND (statuses.reblog_of_id IS NULL) 
AND (statuses.reply = FALSE OR statuses.in_reply_to_account_id = statuses.account_id) 
AND account_id in ( select "accounts"."id" from "accounts" where "accounts"."id" = "statuses"."account_id"  and "accounts"."silenced" = FALSE)
ORDER BY "statuses"."id" DESC LIMIT 40

Limit  (cost=0.43..3493.71 rows=40 width=16) (actual time=0.175..4.348 rows=40 loops=1)
  ->  Index Scan Backward using statuses_pkey on statuses  (cost=0.43..15892174.69 rows=181974 width=16) (actual time=0.173..4.312 rows=40 loops=1)
        Filter: ((local OR (uri IS NULL)) AND (reblog_of_id IS NULL) AND ((NOT reply) OR (in_reply_to_account_id = account_id)) AND (visibility = 0) AND (SubPlan 1))
        Rows Removed by Filter: 2008
        SubPlan 1
          ->  Index Scan using accounts_pkey on accounts  (cost=0.29..8.31 rows=1 width=8) (actual time=0.008..0.008 rows=1 loops=40)
                Index Cond: (id = statuses.account_id)
                Filter: (NOT silenced)
Planning time: 0.390 ms
Execution time: 4.485 ms

インデックスを作ってみる


部分インデックスというやつを使うと「特定の条件を満たす行」へのインデックスを作成できます

PostgreSQL 9.6.5文書 11.8. 部分インデックス
https://www.postgresql.jp/document/9.6/html/indexes-partial.html
CREATE INDEX accounts_not_silenced ON accounts using btree 
(id)
WHERE not silenced;

CREATE INDEX statuses_public_local ON statuses  using btree 
(id)
WHERE "statuses"."visibility" = 0 
AND (statuses.reblog_of_id IS NULL)
AND (statuses.reply = FALSE OR statuses.in_reply_to_account_id = statuses.account_id) 
AND ("statuses"."local" = TRUE OR "statuses"."uri" IS NULL);


(2017-10-15 18:10) FTLでは部分インデックスが使われなかったのでLTL専用のインデックスに書き直しました

元のクエリ＋インデックス

SELECT "statuses"."id", "statuses"."updated_at" FROM "statuses" 
LEFT OUTER JOIN "accounts" ON "accounts"."id" = "statuses"."account_id" 
WHERE ("statuses"."local" = TRUE OR "statuses"."uri" IS NULL) 
AND "statuses"."visibility" = 0 
AND (statuses.reblog_of_id IS NULL) 
AND (statuses.reply = FALSE OR statuses.in_reply_to_account_id = statuses.account_id) 
AND "accounts"."silenced" = FALSE 
ORDER BY "statuses"."id" DESC LIMIT 40

Limit  (cost=0.71..149.74 rows=40 width=16) (actual time=0.123..0.844 rows=40 loops=1)
  ->  Nested Loop  (cost=0.71..1256574.18 rows=337276 width=16) (actual time=0.122..0.809 rows=40 loops=1)
        ->  Index Scan Backward using statuses_public_local on statuses  (cost=0.42..187072.73 rows=337340 width=24) (actual time=0.073..0.377 rows=40 loops=1)
        ->  Index Only Scan using accounts_not_silenced on accounts  (cost=0.29..3.16 rows=1 width=8) (actual time=0.007..0.008 rows=1 loops=40)
              Index Cond: (id = statuses.account_id)
              Heap Fetches: 40
Planning time: 0.899 ms
Execution time: 0.923 ms

in(select...) + インデックス

SELECT "statuses"."id", "statuses"."updated_at" FROM "statuses" 
WHERE ("statuses"."local" = TRUE OR "statuses"."uri" IS NULL) 
AND "statuses"."visibility" = 0 
AND (statuses.reblog_of_id IS NULL) 
AND (statuses.reply = FALSE OR statuses.in_reply_to_account_id = statuses.account_id) 
AND account_id in ( select "accounts"."id" from "accounts" where "accounts"."id" = "statuses"."account_id"  and "accounts"."silenced" = FALSE)
ORDER BY "statuses"."id" DESC LIMIT 40

Limit  (cost=0.42..388.20 rows=40 width=16) (actual time=0.057..0.477 rows=40 loops=1)
  ->  Index Scan Backward using statuses_public_local on statuses  (cost=0.42..1635157.74 rows=168670 width=16) (actual time=0.056..0.435 rows=40 loops=1)
        Filter: (SubPlan 1)
        SubPlan 1
          ->  Index Only Scan using accounts_not_silenced on accounts  (cost=0.29..8.31 rows=1 width=8) (actual time=0.005..0.005 rows=1 loops=40)
                Index Cond: (id = statuses.account_id)
                Heap Fetches: 40
Planning time: 0.438 ms
Execution time: 0.536 ms

感想


結構はやくなりました。
ただ、マストドン公式はあまりインデックス増やさない方針らしいんですよねー。
インデックス増やすだけなら鯖缶が手作業で出来ちゃうし試してみるとよいかも？
statuses_public_local インデックスのサイズはLTLに出るトゥートの数に比例するので、LTL過疎気味のタンスだと容量消費は軽微です