Skip to content

Instantly share code, notes, and snippets.

@bdach
Last active January 5, 2024 21:21
Show Gist options
  • Save bdach/414d5289f65b0399fa8f9732245a4f7c to your computer and use it in GitHub Desktop.
Save bdach/414d5289f65b0399fa8f9732245a4f7c to your computer and use it in GitHub Desktop.
Extended documentation for osu!catch score conversion changes

Description of approximate conversion algorithm from score V1 to new standardised (score V2)

Rationale

The combo portions of stable's score V1 and score V2 differ in rate of progression. While score V1's combo portion is quadratic in progression in relation to current combo, score V2 is initially logarithmic until 200 combo, after which it is linear. This was demonstrated in prior work in #24823 and #24924, and is apparent on the screenshot below taken from catch's TestSceneScoring:

shapes

Notably, catch is the only ruleset wherein the rate of ascent differs between score V1 and score V2. This means, that for all other rulesets, it was completely fine to estimate the combo portion of score V2 / "standardised" scoring by just taking the linear ratio of combo score achieved on the beatmap to maximum combo score achievable on the beatmap.

This approach however breaks down badly in edge cases in catch due to the unequal rates of progression. The naive implementation would disproportionately tank scores with lots of short combos, especially on marathon maps. Thus, the intent of this document is to describe a method of converting score V1's combo portion to score V2.

The remaining details of the implementation do not differ from the gameplay implementation of "standardised" score / score V2, and as such are omitted from this document. We will focus on the derivation of estimation of combo score explicitly.

Derivation

Note that an accurate recomputation here is infeasible; we do not have the data or processing power to do so. Therefore, things will be rather loose and ballpark from here on now, but experimental results showed the method to be performing adequately in most cases.

To begin with, let's note a few things:

  • For each score, we have the user's maximum combo. Therefore, we know that the user has achieved a combo streak this long.

  • We do not and cannot know which objects the combo streak consisted of. This matters because fruits and large droplets have different combo score weighting.

    Simplification: We will ignore this out of necessity and assume equal weighting.

  • We do not and cannot know the length of the other combo streaks in the score.

    Simplification: We will assume the worst case and presume that the lengths of remaining combo streaks are evenly distributed across the map (excluding the known maximum combo streak, and number of misses). The intuition behind this being the worst case can be gathered by looking at the log-then-linear behaviour of combo score: the longer a combo streak is before a miss, the more points is cumulatively lost after the miss.

    This is also apparent on the following screenshots:

    10 misses in sequence after 500 combo 10 misses every 100 objects after 500 combo
    misses-close-together spread-out-misses

    We will also make a worst-case assumption that every time the player dropped combo, they had already reached 200 combo, therefore reaching the linear part of the curve and causing the maximum loss of combo score. This will punish scores on short maps harsher, where this assumption will generally not hold.

The above simplifications allow us to produce a semblance of estimate of the combo score.

Assuming equal weighting of objects, the combo score for a single object, as a function of current combo, can be approximated by

$$ f(x) = \begin{cases} 0.5 & x < 2 \\ \log_4(x) & 2 \leq x < 200 \\ \log_4(200) & 200 \leq x \end{cases} $$

For convenience later, let's also denote

$$ F(x) = \int_0^x f(t) dt $$

In human language, large $F$ is the total combo score achieved on a combo of length $x$.

Thus, we can roughly approximate the number of combo score points lost by dropping combo after reaching 200 on a combo streak as

$$ \delta(x) = \int_0^x \log_4(200) dt - \int_0^x f(t) dt $$

This is generally a bit annoying to compute given the piecewise construction of $f(x)$, so let's just discard the $x &lt; 2$ case. The $x \geq 200$ case actually simplifies itself, as then we have

$$ \begin{split} \delta(x) &= \int_0^x \log_4(200) dt - \int_0^x f(t) dt = \\ &= \left( \int_0^{200} \log_4(200) dt + \int_{200}^x \log_4(200) dt \right) - \left( \int_0^{200} f(t) dt + \int_{200}^{x} f(t) dt \right) = \\ &= \left( \int_0^{200} \log_4(200) dt + \int_{200}^x \log_4(200) dt \right) - \left( \int_0^{200} f(t) dt + \int_{200}^x \log_4(200) dt \right) = \\ &= \int_0^{200} \log_4(200) dt - \int_0^{200} f(t) dt \\ &= \delta(200) \end{split} $$

so the function linearly caps itself out after that.

Thus, after some handwaving, I can write

$$ \delta(x) \approx \int_0^x \left( \log_4(200) - \log_4(t) \right) dt $$

which wolframalpha says is

$$ \delta(x) \approx \frac{x (1 + \ln(200) - \ln(x))}{\ln(4)} $$

And therefore, if $x_{max}$ is the maximum possible combo length on the beatmap, $\overline{x}$ is the assumed average length of all remaining combo streaks and $n$ is the number of them,

$$ 1 - \frac{n \cdot \delta(x)}{F(x_{max})} $$

is our ballpark, pessimistic estimate of the ratio of combo points lost by the player on that particular play.

One exception is used when after subtracting the max combo streak and number of misses from max combo, there are not enough objects left to divide them to at-least one-long combo streaks; in that case

$$ \frac{F(x_{max}) - F(x_{play})}{F(x_{max})} $$

where $x_{play}$ is the length of the longest combo streak in the play, is used as the estimation.

Inspection of performance of catch scoring conversion algorithm for selected outlier scores

The general goal here is to reduce the number of incorrect conversions by examining gross outliers and eliminating them wherever possible.

The secondary goal is, whenever it is infeasible to get a somewhat precise estimate of standardised score from score V1, to prefer underestimating standardised score rather than overestimating, to ensure lazer scores can beat imported stable scores.

Most attention was given to:

  • big relative and absolute increases
  • big relative and absolute decreases
  • scores crossing the 1M threshold in either direction

Key examples driving iterations of changes

At the time of writing, the PR went through 3 iterations of changes. Each iteration was driven by a significant shortcoming in the previous iteration. This section will present key examples motivating why the changes were made.

Mods active None
Accuracy 98.12%
Max combo 370x
Score V1 22,267,704
Standardised score (master) 913,335
Standardised score (this PR) 556,242
Final score after replay playback with score V2 active on stable1 718,179
The above, but with 0.96x multiplier applied2 689,451
Final score after replay playback on lazer3 649,954

After implementing the naive version of the conversion (by just adjusting the combo and "accuracy" portions to match score V2) this revealed that the combo estimation could be way off for catch scores, due to a difference between the growth rate of combo score between score V1 and score V2 on stable.

Stable's score V1 is quadratic in nature, while score V2 is initially logarithmic and then linear. This means that on scores with a lot of combo breaks, linearly rescaling the achieved combo score relative to the maximum achievable combo score would massively underweight scores that had multiple combo breaks, because of the influence of the quadratic.

This spurred the implementation of the alternative method of estimating the combo portion for catch. Notably, this is a catch-specific issue, as it is the only ruleset in which the growth rate of combo score noticeably differs between score V1 and score V2:

  • osu! combo score is quadratic both in score V1 and score V2.
  • taiko and mania combo score is linear both in score V1 and score V2 (to a degree - it's actually logarithmic and then linear in score V2, but it will not cause such a massive difference in practice).

This was counteracted by a separate estimation of the combo portion for catch.

Mods active none
Accuracy 92.90%
Max combo 363x
Score V1 12,672,238
Standardised score (master) 360,524
Standardised score (this PR) 405,998
Final score after replay playback with score V2 active (stable)1 600,035
The above, but with 0.96x multiplier applied2 576,034
Final score after replay playback (lazer)3 558,553

The first iteration of the estimation of combo portion mentioned in the previous section was optimistic, by assuming that the player had hit as many combo streaks of length equal to the max combo as possible. This ended up overestimating scores that had many misses, by grossly overestimating how many combo streaks they achieved.

This spurred a change to the algorithm to err on the pessimistic side: the second implementation assumed that the player hit one combo streak of the length equal to max combo, and the remaining combo streaks are divided equally in length to fill out max_beatmap_combo - max_score_combo - score_miss_count. (If that turns out to floor to zero, then it is presumed that the player hit a single combo streak of length equal to max combo, then missed everything else.)

The above method is the pessimistic estimate, as missing after a long combo streak is more costly than missing multiple objects in quick succession. (You can use the catch's TestSceneScoring to experimentally verify this.)

Big relative increases by diff%

Mods active SD, FL
Accuracy 23.61%
Max combo 26x (FC)
Score V1 14,519
Standardised score (master) 208,758
Standardised score (this PR) 742,467
Final score after replay playback with score V2 active (stable)1 N/A (no replay)
Final score after replay playback (lazer) N/A (no replay)

Not able to accurately investigate this one due to lack of replay, but there are some hints, namely that the score is a FC but has lots of droplets missed.

I decided to investigate this one by setting a similar score with nomod on stable wherein I made sure to get FC but miss as many droplets as feasible. I got:

  • 37,528 score on stable at 44.44% accuracy
  • 748,130 score when above is imported to lazer
  • 745,374 score when playing back said imported replay in lazer
  • 779,646 score when taking said initial replay, binary-modifying it to specify Score V2, and playing that back again in stable with Score V2 active
  • 748,460 score when applying 0.96x multiplier to above

Causes of behaviour:

  • Correct modeling of portions of stable score V2
  • Correct estimation of combo portion in terms of matching score V2

Verdict: Working as intended

Similar cases that will not be discussed further:

Mods active HD
Accuracy 100.00%
Max combo 2498x (FC)
Score V1 195,830,244
Standardised score (master) 2,979,120
Standardised score (this PR) 8,255,109
Final score after replay playback with score V2 active (stable)1 skipped
The above, but with 0.96x multiplier applied2 skipped
Final score after replay playback (lazer)3 skipped

Causes of behaviour: This was and continues to be wrong, as it should not really be possible for any score to exceed 1M after conversion. The root cause is incorrect legacy score attribute calculation, which spurs from the non-matching calculation of difficultyPeppyStars between stable and lazer. The unrounded value for this beatmap is 4.5, which:

  • stable rounds up to 5
  • lazer rounds down to 4

This has a knock-on effect of majorly underestimating maximum achievable combo portion, and thus majorly overestimating achieved bonus portion, which ends up amounting to almost 40 million points.

In order to match score V2, this pull increases the value of bananas caught from 50 to 200 points. This has the effect of increasing BonusScore and BonusScoreRatio in legacy scoring attributes fourfold, and thus explains the difference.

This is likely to be an issue across all rulesets and as such will not be addressed in this pull.

Verdict: Was and continues to be broken, fix may be attempted independently as it may impact beatmaps in all rulesets

Similar cases that will not be discussed further:

A curious variant in the other direction happens on beatmaps such as:

wherein stable rounds 3.5 down to 3, and lazer rounds it up to 4. This causes the combo portion to be overestimated by lazer, but due to clamping applied to fix previous issues (estimated bonus portion is clamped from below to 0), scores receive much more correct (yet still not completely correct) totals.

Mods active none
Accuracy 53.24%
Max combo 1469x
Score V1 11,188,424
Standardised score (master) 260,648
Standardised score (this PR) 662,888
Final score after replay playback with score V2 active (stable)1 skipped due to map length
The above, but with 0.96x multiplier applied2 skipped due to map length
Final score after replay playback (lazer)3 341,793

Causes of behaviour: This is a 23-minute long gimmick catch map with tiny droplet spam, a long combo streak in the middle that literally plays itself (catcher is in one position for minutes on end), and NaN abuse in control point specs. Map also looks wildly differently between stable and lazer (has a weird juice stream on far right of playfield at start that's seemingly not there on stable).

Verdict: I honestly can't find it in myself to try and debug this one. Map is broken, should not be allowed to have a leaderboard.

Similar cases that will not be discussed further:

Mods active none
Accuracy 100%
Max combo 641x
Score V1 148,230
Standardised score (master) 384,000
Standardised score (this PR) 960,000
Final score after replay playback with score V2 active (stable)1 1,000,000
The above, but with 0.96x multiplier applied2 960,000
Final score after replay playback (lazer)3 960,000

Causes of behaviour: This is a beatmap-specific issue.

  • The beatmap has a difficultyPeppyStars of 0.
  • Therefore, the combo portion as estimated by CatchLegacyScoreSimulator was 0.
  • Therefore, the comboProportion in the standardised score migration code was also estimated as 0 - but actually should have been 1, as the score is an FC.
  • Therefore, previously only $400000 \cdot 0.96 = 384000$ points would be awarded rather than the correct 960000.

Verdict: Working as intended. The estimation of comboProportion for remaining rulesets likely needs to be revised to anticipate this scenario happening on other beatmaps, likely by checking for active mods (if relax/autopilot active, assume combo portion of 0, else 1).

Similar cases that will not be discussed further: All scores on this map.

Mods active none
Accuracy 0.00%
Max combo 0x (full combo)
Score V1 6,600
Standardised score (master) 386,871
Standardised score (this PR) 961,152
Final score after replay playback with score V2 active (stable)1 N/A (no replay)
The above, but with 0.96x multiplier applied2 N/A (no replay)
Final score after replay playback (lazer)3 N/A (no replay)

Causes of behaviour: The explanation for this one is simple. This is a gimmick map, which consists only of banana showers. Thus, it has 0x maximum combo. (Notably, on stable, score V2 totally does not handle this, and refuses to grant any points even if hitting bananas.)

With the current implementation, this means that the player will get 1M score automatically for free, and then any bananas caught will give 200 points per banana. So the old score automatically looks wrong.

Now, given the special circumstances, we have the following info:

  • Stable shows 6,600 total score. Each banana gives 1,100 score on stable given score V1. This means 6 bananas were hit.
  • In score V2, and thus in lazer standardised, each banana gives 200 points.
  • Classic mod has 0.96x multiplier.

Thus,

$$ (1000000 + 6 \cdot 200) \cdot 0.96 = 961152 $$

QED.

Verdict: Working as intended.

At this point I concluded that the "gain" side of the sheet looks solid enough and I don't see anything obviously broken.

Big relative decreases by diff%

Mods active DT, EZ
Accuracy 34.88%
Max combo 3x
Score V1 109,976
Standardised score (master) 237,056
Standardised score (this PR) 796
Final score after replay playback with score V2 active (stable)1 no replay
The above, but with 0.96x multiplier applied2 no replay
Final score after replay playback (lazer)3 no replay

Causes of behaviour: Broken loved map is broken but this time in the other direction. Not willing to spend time investigating.

Verdict: /shrug

Mods active NF, HD, HR, DT, FL
Accuracy 31.32%
Max combo 7x
Score V1 361,982
Standardised score (master) 87,982
Standardised score (this PR) 306
Final score after replay playback with score V2 active (stable)1 51,037
The above, but with 0.96x multiplier applied2 48,996
Final score after replay playback (lazer)3 31,992

Causes of behaviour:

  • Player did not move catcher once until almost the end of the map.
  • Hence, constant combo drops.
  • Beatmap has no small droplets, hence the "accuracy" / "small droplets" portion of score amounts to zero. All 1 million score is the combo portion.
  • The estimation of the combo portion returned a value so miserable as to only award 306 points.

Verdict: Very underestimated, but nobody should ever care.

Mods active NF
Accuracy 0.00%
Max combo 0x
Score V1 53,900
Standardised score (master) 113,849
Standardised score (this PR) 599
Final score after replay playback with score V2 active (stable)1 skipped
The above, but with 0.96x multiplier applied2 skipped
Final score after replay playback (lazer)3 4,704

Causes of behaviour: A "dodge the beat" play. Player missed every object until the final banana shower, at which point they collected 49 bananas.

Verdict: Very underestimated, but nobody should ever care. Much better than master anyhow.

Mods active NF, HD, HR, DT
Accuracy 48.39%
Max combo 5x
Score V1 32,059
Standardised score (master) 121,564
Standardised score (this PR) 3,324
Final score after replay playback with score V2 active (stable)1 ~82,100
The above, but with 0.96x multiplier applied2 78,816
Final score after replay playback (lazer)3 51,466

Causes of behaviour:

  • No droplets in beatmap.
  • Underestimated combo portion due to very small max combo.

Verdict: A bit harsh, but not sure what else can be done at this point...

Mods active HR, DT, FL
Accuracy 100.00%
Max combo 187x (FC)
Score V1 630,351
Standardised score (master) 10,264,150
Standardised score (this PR) 1,471,369
Final score after replay playback with score V2 active (stable)1 1,125,148
The above, but with 0.96x multiplier applied2 1,080,142
Final score after replay playback (lazer)3 740,140

Causes of behaviour: Complete mess. Beatmap uses NaN lengths on some slider objects in .osu file. Leading to some objects being straight up not parsed by lazer. Thus, the combo portion estimation actually goes negative, but is saved by the clamp to [0,1].

Verdict: I want to get off Mr HeliX's wild ride.

Mods active DT
Accuracy 16.42%
Max combo 6x
Score V1 121,882
Standardised score (master) 419,343
Standardised score (this PR) 109,360
Final score after replay playback with score V2 active (stable)1 190,100
The above, but with 0.96x multiplier applied2 182,496
Final score after replay playback (lazer)3 196,099

Verdict: Perhaps combo portion a little underestimated, but generally operating as designed.

Mods active HR, FL
Accuracy 86.01%
Max combo 15x
Score V1 65,670
Standardised score (master) 591,250
Standardised score (this PR) 310,419
Final score after replay playback with score V2 active (stable)1 564,286
The above, but with 0.96x multiplier applied2 541,715
Final score after replay playback (lazer)3 612,086

Verdict: Well this is one of the truly unlucky ones. Score got majorly hammered by the negative combo estimation. Not sure it's worth doing much to salvage a C rank score.

Mods active none
Accuracy 81.25%
Max combo 13x
Score V1 42,466
Standardised score (master) 335,383
Standardised score (this PR) 183,674
Final score after replay playback with score V2 active (stable)1 401,383
The above, but with 0.96x multiplier applied2 385,327
Final score after replay playback (lazer)3 385,320

Verdict: Another harsh hit from combo estimation, as above. For rank D, seems like it's probably fine to keep as is, though.

Footnotes

  1. This was achieved by binary modifying the .osr file to specify score V2 as an active mod for the replay, playing it back until the end, and observing the final total score. 2 3 4 5 6 7 8 9 10 11 12 13 14 15

  2. lazer applies 0.96x multiplier due to classic mod being attached to stable scores, so this is applied onto stable total for comparison purposes. 2 3 4 5 6 7 8 9 10 11 12 13 14

  3. While this is provided for general reference to ensure that no score value from actual playback clearly deviates from the rest, the score value after lazer's replay playback may be inaccurate due to inaccuracies of the actual replay playback. Hit statistics after playing back a replay routinely do not match the statistics stored to the replay. 2 3 4 5 6 7 8 9 10 11 12 13 14

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment