Skip to content

Instantly share code, notes, and snippets.

@gpluscb
Last active April 29, 2023 02:20
Show Gist options
  • Save gpluscb/302d6b71a8d0fe9f4350d45bc828f802 to your computer and use it in GitHub Desktop.
Save gpluscb/302d6b71a8d0fe9f4350d45bc828f802 to your computer and use it in GitHub Desktop.
So You Want to Use Glicko-2 for Your Game's Ratings

I wrote this article right after I published the first version of instant-glicko-2. It is meant to document how to implement a Glicko-2 algorithm that allows for instant feedback after games.

So You Want To Use Glicko-2 For Your Game's Ratings

Great! Glicko-2 is a very cool rating system. And a popular choice too! Lichess, CS:GO, and Splatoon 2 all use the Glicko-2 system.

It also offers unique advantages.

Glicko the first

Glicko-2 is builds on the original Glicko rating system. Glicko aims to improve on Elo by adding a measure of rating uncertainty, the "ratings deviation" (RD).

Using this value, we can calculate a confidence interval in which the player's actual strenght most likely lies. If r is the rating, the players actual strength is expected to lie between r - 2RD and r + 2RD in 95% of cases.

This RD value always decreases with every game the player plays - after all, a played game is a good clue to the player's actual strengh. And when the player doesn't play games, it decreases with the time of inactivity. So if a player stops playing rated games for a year, we are less certain about their strength coming back.

To achieve the RD decay over time, it also introduces a little devil named "rating period". But we'll think about that when we actually try to use Glicko-2 for our game.

Glicko-2 (the sequel)

Glicko-2 aims to further improve on Glicko by introducing another variable to the rating, the "rating volatility" (σ). This value describes the expected fluctuation in rating. If the value is high, the player is expected to have some high fluctuation in performance, and if it is low, they are expected to be very consistent. The value does not affect the confidence interval discussed above.

The average value will be higher for games that, for example, require some amount of luck, or where fewer games per match are played.

The value doesn't change during times of inactivity.

For intermission, listen to me go on a short rant about esports that is only kinda related

Now if you allow me to go on a small tangent here, I see potential for some great marketing in this rating volatility too. Give it a catchy name, and you can have stories about how players with a high X-Factor have the most exciting and dramatic performances. Anything can happen when they're on stage.

Meanwhile, players with a very low X-Factor are walls. They are extremely solid, experts in dealing with every playstyle you can throw at them, and they are a true test of strength. If you beat them, your improvement payed off. After all, beating them is very unlikely to be a fluke.

These stories happen organically in competitive games. For example, you'll hear a lot of commentators comment on how consistent and how much of a wall Dabuz is when he is on stage in Smash Bros. broadcasts. He even has been crowned "King of Consistency" by respected Smash Bros. community ranking authority PGstats.

The same article that names Dabuz as a very consistent player also names Marss as a player who is the opposite.

When Marss is hot, he is nigh unbeatable by anyone outside of the top 5 players in the world. When he is playing at his best, his potential is limitless. The problem is consistency [...].

I think the possibility to capture those stories in a value even for players who are not at the very top and who will not have such articles written about them is very exciting.

Implementation

The implementation should be relatively straight-forward. We just look at the steps described in Glickman's paper, and we're good. Just one little problem...

Screenshot from Glickman's paper. The section is "The formulas". The higlighted text reads: To apply the rating algorithm, we treat a collection of games within a "rating period" to have occurred simultaneously. Players would have ratings, RD's, and volatilities at the beginning of the rating period, game outcomes would be observed, and then updated ratings, RD's and volatilities would be computed at the end of the rating period

Do you spot it?

I brushed away the little devil named "rating period" earlier, and now it's coming back to haunt us.

We can only calculate ratings when such a rating period completes, and they don't complete after every game! In fact, Glickman recommends that at least 10-15 games per player should happen every rating period. So this is something we need to work around if we want to show our players how their ranking changed after a game. There are multiple approaches.

One simple approach is described in a blogpost by Ryan Juckett titled "The Online Skill Ranking of INVERSUS Deluxe". But this approach also has drawbacks. The later blogpost "Additional Thoughts on Skill Ratings" adresses these, and proposes a potential solution.

This solution seems to be very similar or even identical to the one Lichess uses. And one great thing is: Lichess is open source!

The crux of the solution is to allow fractional rating periods. We now can evaluate temporary ratings for a specific point in time in a rating period, and work with that. The secret sauce can be found in the RatingCalculator class in the Lichess implementation. Or, alternatively, in me own repo for which I stole it :).

So our new strategy for calculating a player's rating at a given point in time is:

  1. If necessary, close every rating period for our players that hasn't been closed yet and commit their rankings.
    We do this by just performing the steps described in the paper.
  2. Get every result for the player in the current rating period.
  3. Get the current player rating by using the results in the current rating period, as well as Lichess' our cool fractional period secret sauce.

And that's it really.

Sources/further reading

Wikipedia

Elo

Glicko and Glicko-2

Actual Other sources

Original paper on Glicko

Original paper on Glicko-2

Glickman's other adventures

Lichess' rating source code

Blogpost on how INVERSUS Deluxe implements Glicko-2

Blogpost on how the dev of INVERSUS Deluxe would want to implement Glicko-2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment