Skip to content

Instantly share code, notes, and snippets.

@paolobarbolini
Last active August 2, 2023 05:52
Show Gist options
  • Save paolobarbolini/b5b97a282d5feced736b7c0a70c7814e to your computer and use it in GitHub Desktop.
Save paolobarbolini/b5b97a282d5feced736b7c0a70c7814e to your computer and use it in GitHub Desktop.
crates.io `serde` reverse dependencies query EXPLAIN ANALYZE output
Limit (cost=513870.08..534308.54 rows=10 width=99) (actual time=10808.569..12615.884 rows=10 loops=1)
-> Unique (cost=513870.08..3444730.43 rows=1434 width=99) (actual time=10808.566..12615.851 rows=10 loops=1)
-> Nested Loop (cost=513870.08..3444723.26 rows=1434 width=99) (actual time=10808.562..12615.801 rows=13 loops=1)
Join Filter: (versions.crate_id = crates.id)
Rows Removed by Join Filter: 887880
-> Index Scan using idx3 on crates (cost=0.42..54507.51 rows=121264 width=20) (actual time=0.031..0.344 rows=31 loops=1)
-> Materialize (cost=513869.66..781830.70 rows=1434 width=83) (actual time=261.982..373.001 rows=28642 loops=31)
-> Nested Loop (cost=513869.66..781823.53 rows=1434 width=83) (actual time=8121.412..10487.713 rows=29474 loops=1)
-> Subquery Scan on versions (cost=513869.23..740005.96 rows=4002 width=8) (actual time=8121.369..10121.764 rows=29753 loops=1)
Filter: (versions.rn = 1)
Rows Removed by Filter: 295421
-> WindowAgg (cost=513869.23..729999.91 rows=800484 width=497) (actual time=8121.366..9715.411 rows=325174 loops=1)
-> Sort (cost=513869.23..515870.44 rows=800484 width=40) (actual time=8121.349..8765.434 rows=325174 loops=1)
Sort Key: versions_1.crate_id, (to_semver_no_prerelease((versions_1.num)::text)) DESC NULLS LAST
Sort Method: external merge Disk: 15968kB
-> Hash Join (cost=169542.77..413490.72 rows=800484 width=40) (actual time=4119.683..7469.829 rows=325174 loops=1)
Hash Cond: (versions_1.crate_id = versions_2.crate_id)
-> Seq Scan on versions versions_1 (cost=0.00..32820.30 rows=800484 width=14) (actual time=0.013..1028.906 rows=799607 loops=1)
Filter: (NOT yanked)
Rows Removed by Filter: 48723
-> Hash (cost=169033.98..169033.98 rows=40703 width=4) (actual time=4119.516..4119.533 rows=30630 loops=1)
Buckets: 65536 Batches: 1 Memory Usage: 1589kB
-> HashAggregate (cost=168626.95..169033.98 rows=40703 width=4) (actual time=4045.309..4082.804 rows=30630 loops=1)
Group Key: versions_2.crate_id
Batches: 1 Memory Usage: 3089kB
-> Hash Join (cost=39183.24..167909.75 rows=286879 width=4) (actual time=2097.373..3672.798 rows=294498 loops=1)
Hash Cond: (dependencies_1.version_id = versions_2.id)
-> Bitmap Heap Scan on dependencies dependencies_1 (cost=3223.74..125641.19 rows=286879 width=4) (actual time=16.090..808.300 rows=294498 loops=1)
Recheck Cond: (crate_id = 463)
Rows Removed by Index Recheck: 3165463
Heap Blocks: exact=36463 lossy=33035
-> Bitmap Index Scan on index_dependencies_crate_id (cost=0.00..3152.02 rows=286879 width=0) (actual time=11.914..11.915 rows=294498 loops=1)
Index Cond: (crate_id = 463)
-> Hash (cost=22041.37..22041.37 rows=848330 width=8) (actual time=2081.207..2081.211 rows=848330 loops=1)
Buckets: 131072 Batches: 16 Memory Usage: 3099kB
-> Index Only Scan using idx1 on versions versions_2 (cost=0.42..22041.37 rows=848330 width=8) (actual time=0.014..1019.273 rows=848330 loops=1)
Heap Fetches: 0
-> Index Scan using index_dependencies_version_id on dependencies (cost=0.43..10.43 rows=2 width=79) (actual time=0.006..0.008 rows=1 loops=29753)
Index Cond: (version_id = versions.id)
Filter: (crate_id = 463)
Rows Removed by Filter: 10
Planning Time: 0.656 ms
Execution Time: 12618.394 ms
EXPLAIN ANALYZE SELECT * FROM (
-- Multple dependencies can exist, make it distinct
SELECT DISTINCT ON (crate_downloads, crate_id2)
dependencies.*,
crates.id AS crate_id2,
crates.downloads AS crate_downloads,
crates.name AS crate_name
FROM dependencies
-- We only want the crates whose *max* version is dependent, so we join on a
-- subselect that includes the versions with their ordinal position
INNER JOIN (
SELECT versions.*,
row_number() OVER (
PARTITION BY crate_id
ORDER BY to_semver_no_prerelease(num) DESC NULLS LAST
) rn
FROM versions
WHERE NOT yanked
-- This is completely redundant, but it's faster to filter the versions
-- early even if this subselect is done via an index scan.
AND crate_id = ANY(
SELECT versions.crate_id
FROM versions
INNER JOIN dependencies
ON dependencies.version_id = versions.id
WHERE dependencies.crate_id = 463
)
) versions
ON versions.id = dependencies.version_id
INNER JOIN crates
ON crates.id = versions.crate_id
WHERE dependencies.crate_id = 463
AND rn = 1
ORDER BY crate_downloads DESC
) t
OFFSET 0
LIMIT 10;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment