Skip to content

Instantly share code, notes, and snippets.

@antzucaro
Created March 14, 2021 18:37
Show Gist options
  • Save antzucaro/bcaac033002db6801f82078cae484d37 to your computer and use it in GitHub Desktop.
Save antzucaro/bcaac033002db6801f82078cae484d37 to your computer and use it in GitHub Desktop.
Query Tuning - Top Scorers by Server ID

Server "top scorers" query

The original query in Python, using the ORM:

top_scorers_q = DBSession.query(
    fg.row_number().over(
        order_by=expr.desc(func.sum(PlayerGameStat.score))).label("rank"),
    Player.player_id, Player.nick,
    func.sum(PlayerGameStat.score).label("total_score"))\
    .filter(Player.player_id == PlayerGameStat.player_id)\
    .filter(Game.game_id == PlayerGameStat.game_id)\
    .filter(Game.server_id == self.server_id)\
    .filter(Player.player_id > 2)\
    .filter(PlayerGameStat.create_dt >
            (self.now - timedelta(days=self.lifetime)))\
    .order_by(expr.desc(func.sum(PlayerGameStat.score)))\
    .group_by(Player.nick)\
    .group_by(Player.player_id)\
    .limit(self.limit)

...which, when run with SQL debugging on turns into the following raw SQL:

SELECT 
row_number() OVER (ORDER BY sum(player_game_stats.score) DESC) AS rank, 
players.player_id AS players_player_id, 
players.nick AS players_nick, 
sum(player_game_stats.score) AS total_score 

FROM 
player_game_stats, players, games 

WHERE 
players.player_id = player_game_stats.player_id 
AND games.game_id = player_game_stats.game_id 
AND games.server_id = 25320
AND players.player_id > 2
AND player_game_stats.create_dt > '2021-02-17' 

GROUP BY players.nick, players.player_id 
ORDER BY sum(player_game_stats.score) DESC 
LIMIT 10

The query plan estimated cost for this statement on PostgreSQL 12 is large:

Limit (cost=375431.74..375431.91 rows=10 width=38)

The query plan estimated cost for this statement on PostgreSQL 9 is even larger:

Limit (cost=1174870.48..1174870.65 rows=10 width=26)

The tuned query is as follows:

SELECT 
row_number() OVER (ORDER BY sum(player_game_stats.score) DESC) AS rank, 
players.player_id AS players_player_id, players.nick AS players_nick, 
sum(player_game_stats.score) AS total_score

FROM player_game_stats
INNER JOIN players USING (player_id)
INNER JOIN games USING (game_id)

WHERE games.server_id = 25320
AND players.player_id > 2 
AND player_game_stats.create_dt BETWEEN '2021-02-17' AND (now() at time zone 'UTC' + interval '1 day')
AND games.create_dt BETWEEN '2021-02-17' AND (now() at time zone 'UTC' + interval '1 day')

GROUP BY players.nick, players.player_id 
ORDER BY total_score DESC
LIMIT 10 

The query plan estimated cost for this statement on PostgreSQL 12:

Limit (cost=83584.72..83584.90 rows=10 width=38)

The query plan estimated cost for this statement on PostgreSQL 9:

Limit (cost=143167.59..143167.76 rows=10 width=26)

If we construct a "materialized view" of the same information, we can drastically reduce the impact.

CREATE TABLE recent_game_stats AS

SELECT 
g.game_id, g.server_id, g.map_id, p.player_id, p.nick, pgs.score, pgs.alivetime

FROM player_game_stats pgs
INNER JOIN players p USING (player_id)
INNER JOIN games g USING (game_id)

WHERE
p.player_id > 2 
AND p.active_ind = true
AND pgs.create_dt BETWEEN '2021-02-17' AND (now() at time zone 'UTC' + interval '1 day')
AND g.create_dt BETWEEN '2021-02-17' AND (now() at time zone 'UTC' + interval '1 day');

CREATE INDEX recent_game_stats_ix001 on recent_game_stats(server_id);
CREATE INDEX recent_game_stats_ix002 on recent_game_stats(map_id);

The query then becomes:

SELECT 
row_number() OVER (ORDER BY sum(score) DESC) AS rank, 
player_id, nick, 
sum(score) AS total_score

FROM recent_game_stats

WHERE server_id = 25320

GROUP BY nick, player_id 
ORDER BY total_score DESC
LIMIT 10 

The query plan estimated cost for this is much less (only available on PG 12):

Limit (cost=1094.68..1094.85 rows=10 width=42)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment