bdach/conversion-algorithm.md

## conversion-algorithm.md

      
    Raw
  

              conversion-algorithm.md
            
          
    Description of approximate conversion algorithm from score V1 to new standardised (score V2)

Rationale

The combo portions of stable's score V1 and score V2 differ in rate of progression. While score V1's combo portion is quadratic in progression in relation to current combo, score V2 is initially logarithmic until 200 combo, after which it is linear. This was demonstrated in prior work in #24823 and #24924, and is apparent on the screenshot below taken from catch's TestSceneScoring:

Notably, catch is the only ruleset wherein the rate of ascent differs between score V1 and score V2. This means, that for all other rulesets, it was completely fine to estimate the combo portion of score V2 / "standardised" scoring by just taking the linear ratio of combo score achieved on the beatmap to maximum combo score achievable on the beatmap.
This approach however breaks down badly in edge cases in catch due to the unequal rates of progression. The naive implementation would disproportionately tank scores with lots of short combos, especially on marathon maps. Thus, the intent of this document is to describe a method of converting score V1's combo portion to score V2.
The remaining details of the implementation do not differ from the gameplay implementation of "standardised" score / score V2, and as such are omitted from this document. We will focus on the derivation of estimation of combo score explicitly.
Derivation

Note that an accurate recomputation here is infeasible; we do not have the data or processing power to do so. Therefore, things will be rather loose and ballpark from here on now, but experimental results showed the method to be performing adequately in most cases.
To begin with, let's note a few things:


For each score, we have the user's maximum combo. Therefore, we know that the user has achieved a combo streak this long.


We do not and cannot know which objects the combo streak consisted of. This matters because fruits and large droplets have different combo score weighting.
Simplification: We will ignore this out of necessity and assume equal weighting.


We do not and cannot know the length of the other combo streaks in the score.
Simplification: We will assume the worst case and presume that the lengths of remaining combo streaks are evenly distributed across the map (excluding the known maximum combo streak, and number of misses). The intuition behind this being the worst case can be gathered by looking at the log-then-linear behaviour of combo score: the longer a combo streak is before a miss, the more points is cumulatively lost after the miss.
This is also apparent on the following screenshots:


10 misses in sequence after 500 combo
10 misses every 100 objects after 500 combo


We will also make a worst-case assumption that every time the player dropped combo, they had already reached 200 combo, therefore reaching the linear part of the curve and causing the maximum loss of combo score. This will punish scores on short maps harsher, where this assumption will generally not hold.


The above simplifications allow us to produce a semblance of estimate of the combo score.
Assuming equal weighting of objects, the combo score for a single object, as a function of current combo, can be approximated by
$$ f(x) = \begin{cases}
0.5 & x < 2 \\
\log_4(x) & 2 \leq x < 200 \\
\log_4(200) & 200 \leq x
\end{cases} $$
For convenience later, let's also denote
$$ F(x) = \int_0^x f(t) dt $$
In human language, large $F$ is the total combo score achieved on a combo of length $x$.
Thus, we can roughly approximate the number of combo score points lost by dropping combo after reaching 200 on a combo streak as
$$ \delta(x) = \int_0^x \log_4(200) dt - \int_0^x f(t) dt $$
This is generally a bit annoying to compute given the piecewise construction of $f(x)$, so let's just discard the $x &lt; 2$ case. The $x \geq 200$ case actually simplifies itself, as then we have
$$
\begin{split}
\delta(x) &= \int_0^x \log_4(200) dt - \int_0^x f(t) dt = \\
&= \left( \int_0^{200} \log_4(200) dt + \int_{200}^x \log_4(200) dt \right) - \left( \int_0^{200} f(t) dt + \int_{200}^{x} f(t) dt \right) = \\
&= \left( \int_0^{200} \log_4(200) dt + \int_{200}^x \log_4(200) dt \right) - \left( \int_0^{200} f(t) dt + \int_{200}^x \log_4(200) dt \right) = \\
&= \int_0^{200} \log_4(200) dt - \int_0^{200} f(t) dt \\
&= \delta(200)
\end{split}
$$
so the function linearly caps itself out after that.
Thus, after some handwaving, I can write
$$ \delta(x) \approx \int_0^x \left( \log_4(200) - \log_4(t) \right) dt $$
which wolframalpha says is
$$ \delta(x) \approx \frac{x (1 + \ln(200) - \ln(x))}{\ln(4)} $$
And therefore, if $x_{max}$ is the maximum possible combo length on the beatmap, $\overline{x}$ is the assumed average length of all remaining combo streaks and $n$ is the number of them,
$$ 1 - \frac{n \cdot \delta(x)}{F(x_{max})} $$
is our ballpark, pessimistic estimate of the ratio of combo points lost by the player on that particular play.
One exception is used when after subtracting the max combo streak and number of misses from max combo, there are not enough objects left to divide them to at-least one-long combo streaks; in that case
$$ \frac{F(x_{max}) - F(x_{play})}{F(x_{max})} $$
where $x_{play}$ is the length of the longest combo streak in the play, is used as the estimation.

  
## spreadsheet-analysis.md

      
    Raw
  

              spreadsheet-analysis.md
            
          
    Inspection of performance of catch scoring conversion algorithm for selected outlier scores

The general goal here is to reduce the number of incorrect conversions by examining gross outliers and eliminating them wherever possible.
The secondary goal is, whenever it is infeasible to get a somewhat precise estimate of standardised score from score V1, to prefer underestimating standardised score rather than overestimating, to ensure lazer scores can beat imported stable scores.
Most attention was given to:

big relative and absolute increases
big relative and absolute decreases
scores crossing the 1M threshold in either direction

Key examples driving iterations of changes

At the time of writing, the PR went through 3 iterations of changes. Each iteration was driven by a significant shortcoming in the previous iteration. This section will present key examples motivating why the changes were made.
DUKI MODODIABLO on Bloodthirsty Nightmare Lullaby / The Empress by UNDEAD CORPORATION [Starbow Break Old Ver. V1] (6.72*)


Mods active
None


Accuracy
98.12%


Max combo
370x


Score V1
22,267,704


Standardised score (master)
913,335


Standardised score (this PR)
556,242


Final score after replay playback with score V2 active on stable¹
718,179


The above, but with 0.96x multiplier applied²
689,451


Final score after replay playback on lazer³
649,954


After implementing the naive version of the conversion (by just adjusting the combo and "accuracy" portions to match score V2) this revealed that the combo estimation could be way off for catch scores, due to a difference between the growth rate of combo score between score V1 and score V2 on stable.
Stable's score V1 is quadratic in nature, while score V2 is initially logarithmic and then linear. This means that on scores with a lot of combo breaks, linearly rescaling the achieved combo score relative to the maximum achievable combo score would massively underweight scores that had multiple combo breaks, because of the influence of the quadratic.
This spurred the implementation of the alternative method of estimating the combo portion for catch. Notably, this is a catch-specific issue, as it is the only ruleset in which the growth rate of combo score noticeably differs between score V1 and score V2:

osu! combo score is quadratic both in score V1 and score V2.
taiko and mania combo score is linear both in score V1 and score V2 (to a degree - it's actually logarithmic and then linear in score V2, but it will not cause such a massive difference in practice).

This was counteracted by a separate estimation of the combo portion for catch.
Primakien on THE MEDLEY OF POKeMON RGBY+GSC -3PBs- by hapi [Indigo Plateau] (6.52*)


Mods active
none


Accuracy
92.90%


Max combo
363x


Score V1
12,672,238


Standardised score (master)
360,524


Standardised score (this PR)
405,998


Final score after replay playback with score V2 active (stable)¹
600,035


The above, but with 0.96x multiplier applied²
576,034


Final score after replay playback (lazer)³
558,553


The first iteration of the estimation of combo portion mentioned in the previous section was optimistic, by assuming that the player had hit as many combo streaks of length equal to the max combo as possible. This ended up overestimating scores that had many misses, by grossly overestimating how many combo streaks they achieved.
This spurred a change to the algorithm to err on the pessimistic side: the second implementation assumed that the player hit one combo streak of the length equal to max combo, and the remaining combo streaks are divided equally in length to fill out max_beatmap_combo - max_score_combo - score_miss_count. (If that turns out to floor to zero, then it is presumed that the player hit a single combo streak of length equal to max combo, then missed everything else.)
The above method is the pessimistic estimate, as missing after a long combo streak is more costly than missing multiple objects in quick succession. (You can use the catch's TestSceneScoring to experimentally verify this.)
Big relative increases by diff%

nc enjoyer on Quiet Water by toby fox [Still] (0.36*)


Mods active
SD, FL


Accuracy
23.61%


Max combo
26x (FC)


Score V1
14,519


Standardised score (master)
208,758


Standardised score (this PR)
742,467


Final score after replay playback with score V2 active (stable)¹
N/A (no replay)


Final score after replay playback (lazer)
N/A (no replay)


Not able to accurately investigate this one due to lack of replay, but there are some hints, namely that the score is a FC but has lots of droplets missed.
I decided to investigate this one by setting a similar score with nomod on stable wherein I made sure to get FC but miss as many droplets as feasible. I got:

37,528 score on stable at 44.44% accuracy
748,130 score when above is imported to lazer
745,374 score when playing back said imported replay in lazer
779,646 score when taking said initial replay, binary-modifying it to specify Score V2, and playing that back again in stable with Score V2 active
748,460 score when applying 0.96x multiplier to above

Causes of behaviour:

Correct modeling of portions of stable score V2
Correct estimation of combo portion in terms of matching score V2

Verdict: Working as intended
Similar cases that will not be discussed further:

https://osu.ppy.sh/scores/fruits/172354092

VenenoG on ULT!MATE END by BlackY [Overdose] (6.31*)


Mods active
HD


Accuracy
100.00%


Max combo
2498x (FC)


Score V1
195,830,244


Standardised score (master)
2,979,120


Standardised score (this PR)
8,255,109


Final score after replay playback with score V2 active (stable)¹
skipped


The above, but with 0.96x multiplier applied²
skipped


Final score after replay playback (lazer)³
skipped


Causes of behaviour:
This was and continues to be wrong, as it should not really be possible for any score to exceed 1M after conversion. The root cause is incorrect legacy score attribute calculation, which spurs from the non-matching calculation of difficultyPeppyStars between stable and lazer. The unrounded value for this beatmap is 4.5, which:

stable rounds up to 5
lazer rounds down to 4

This has a knock-on effect of majorly underestimating maximum achievable combo portion, and thus majorly overestimating achieved bonus portion, which ends up amounting to almost 40 million points.
In order to match score V2, this pull increases the value of bananas caught from 50 to 200 points. This has the effect of increasing BonusScore and BonusScoreRatio in legacy scoring attributes fourfold, and thus explains the difference.
This is likely to be an issue across all rulesets and as such will not be addressed in this pull.
Verdict: Was and continues to be broken, fix may be attempted independently as it may impact beatmaps in all rulesets
Similar cases that will not be discussed further:

All other scores on https://osu.ppy.sh/beatmapsets/1618360#fruits/3356086
All scores on https://osu.ppy.sh/beatmapsets/1151836#fruits/2642912
All scores on https://osu.ppy.sh/beatmapsets/1831541#fruits/3759590
and probably many more

A curious variant in the other direction happens on beatmaps such as:

https://osu.ppy.sh/beatmapsets/14208#fruits/52922
https://osu.ppy.sh/beatmapsets/325817#fruits/789087

wherein stable rounds 3.5 down to 3, and lazer rounds it up to 4. This causes the combo portion to be overestimated by lazer, but due to clamping applied to fix previous issues (estimated bonus portion is clamped from below to 0), scores receive much more correct (yet still not completely correct) totals.
Enjuxx on Various BGM 2 by Wizet, Nexon [Ancient Move] (25.45*)


Mods active
none


Accuracy
53.24%


Max combo
1469x


Score V1
11,188,424


Standardised score (master)
260,648


Standardised score (this PR)
662,888


Final score after replay playback with score V2 active (stable)¹
skipped due to map length


The above, but with 0.96x multiplier applied²
skipped due to map length


Final score after replay playback (lazer)³
341,793


Causes of behaviour:
This is a 23-minute long gimmick catch map with tiny droplet spam, a long combo streak in the middle that literally plays itself (catcher is in one position for minutes on end), and NaN abuse in control point specs. Map also looks wildly differently between stable and lazer (has a weird juice stream on far right of playfield at start that's seemingly not there on stable).
Verdict:
I honestly can't find it in myself to try and debug this one. Map is broken, should not be allowed to have a leaderboard.
Similar cases that will not be discussed further:

https://osu.ppy.sh/scores/fruits/185262338
https://osu.ppy.sh/scores/fruits/191659820

Ken Doll on NUCLEAR-STAR by Camellia [Momoko's Absolute] (0.51*)


Mods active
none


Accuracy
100%


Max combo
641x


Score V1
148,230


Standardised score (master)
384,000


Standardised score (this PR)
960,000


Final score after replay playback with score V2 active (stable)¹
1,000,000


The above, but with 0.96x multiplier applied²
960,000


Final score after replay playback (lazer)³
960,000


Causes of behaviour:
This is a beatmap-specific issue.

The beatmap has a difficultyPeppyStars of 0.
Therefore, the combo portion as estimated by CatchLegacyScoreSimulator was 0.
Therefore, the comboProportion in the standardised score migration code was also estimated as 0 - but actually should have been 1, as the score is an FC.
Therefore, previously only $400000 \cdot 0.96 = 384000$ points would be awarded rather than the correct 960000.

Verdict: Working as intended. The estimation of comboProportion for remaining rulesets likely needs to be revised to anticipate this scenario happening on other beatmaps, likely by checking for active mods (if relax/autopilot active, assume combo portion of 0, else 1).
Similar cases that will not be discussed further:
All scores on this map.
Shiya on Almagest(for ctb) by Galdeira [Endless Spinner] (0.15*)


Mods active
none


Accuracy
0.00%


Max combo
0x (full combo)


Score V1
6,600


Standardised score (master)
386,871


Standardised score (this PR)
961,152


Final score after replay playback with score V2 active (stable)¹
N/A (no replay)


The above, but with 0.96x multiplier applied²
N/A (no replay)


Final score after replay playback (lazer)³
N/A (no replay)


Causes of behaviour:
The explanation for this one is simple. This is a gimmick map, which consists only of banana showers. Thus, it has 0x maximum combo. (Notably, on stable, score V2 totally does not handle this, and refuses to grant any points even if hitting bananas.)
With the current implementation, this means that the player will get 1M score automatically for free, and then any bananas caught will give 200 points per banana. So the old score automatically looks wrong.
Now, given the special circumstances, we have the following info:

Stable shows 6,600 total score. Each banana gives 1,100 score on stable given score V1. This means 6 bananas were hit.
In score V2, and thus in lazer standardised, each banana gives 200 points.
Classic mod has 0.96x multiplier.

Thus,
$$ (1000000 + 6 \cdot 200) \cdot 0.96 = 961152 $$
QED.
Verdict: Working as intended.
At this point I concluded that the "gain" side of the sheet looks solid enough and I don't see anything obviously broken.
Big relative decreases by diff%

Nene Sakura on AiAe by Yuyoyuppe [Rhythm Game] (13.95*)


Mods active
DT, EZ


Accuracy
34.88%


Max combo
3x


Score V1
109,976


Standardised score (master)
237,056


Standardised score (this PR)
796


Final score after replay playback with score V2 active (stable)¹
no replay


The above, but with 0.96x multiplier applied²
no replay


Final score after replay playback (lazer)³
no replay


Causes of behaviour:
Broken loved map is broken but this time in the other direction. Not willing to spend time investigating.
Verdict: /shrug
Mazuki on snows Soshite Kiseki by Jea [rains] (5.09*)


Mods active
NF, HD, HR, DT, FL


Accuracy
31.32%


Max combo
7x


Score V1
361,982


Standardised score (master)
87,982


Standardised score (this PR)
306


Final score after replay playback with score V2 active (stable)¹
51,037


The above, but with 0.96x multiplier applied²
48,996


Final score after replay playback (lazer)³
31,992


Causes of behaviour:

Player did not move catcher once until almost the end of the map.
Hence, constant combo drops.
Beatmap has no small droplets, hence the "accuracy" / "small droplets" portion of score amounts to zero. All 1 million score is the combo portion.
The estimation of the combo portion returned a value so miserable as to only award 306 points.

Verdict:
Very underestimated, but nobody should ever care.
_Haven_ on Don't Forget by Froskya [normal] (0.79*)


Mods active
NF


Accuracy
0.00%


Max combo
0x


Score V1
53,900


Standardised score (master)
113,849


Standardised score (this PR)
599


Final score after replay playback with score V2 active (stable)¹
skipped


The above, but with 0.96x multiplier applied²
skipped


Final score after replay playback (lazer)³
4,704


Causes of behaviour:
A "dodge the beat" play. Player missed every object until the final banana shower, at which point they collected 49 bananas.
Verdict:
Very underestimated, but nobody should ever care. Much better than master anyhow.
Nene Sakura on ded_ED by dedp [nightmare] (3.41*)


Mods active
NF, HD, HR, DT


Accuracy
48.39%


Max combo
5x


Score V1
32,059


Standardised score (master)
121,564


Standardised score (this PR)
3,324


Final score after replay playback with score V2 active (stable)¹
~82,100


The above, but with 0.96x multiplier applied²
78,816


Final score after replay playback (lazer)³
51,466


Causes of behaviour:

No droplets in beatmap.
Underestimated combo portion due to very small max combo.

Verdict:
A bit harsh, but not sure what else can be done at this point...
Leups on Extra Stage (Octopus Dance) by Nagamatsu Ryo [1] (0.90*)


Mods active
HR, DT, FL


Accuracy
100.00%


Max combo
187x (FC)


Score V1
630,351


Standardised score (master)
10,264,150


Standardised score (this PR)
1,471,369


Final score after replay playback with score V2 active (stable)¹
1,125,148


The above, but with 0.96x multiplier applied²
1,080,142


Final score after replay playback (lazer)³
740,140


Causes of behaviour:
Complete mess.
Beatmap uses NaN lengths on some slider objects in .osu file.
Leading to some objects being straight up not parsed by lazer.
Thus, the combo portion estimation actually goes negative, but is saved by the clamp to [0,1].
Verdict:
I want to get off Mr HeliX's wild ride.
YERTI on Clocks by Cold Play [Easy] (0.56*)


Mods active
DT


Accuracy
16.42%


Max combo
6x


Score V1
121,882


Standardised score (master)
419,343


Standardised score (this PR)
109,360


Final score after replay playback with score V2 active (stable)¹
190,100


The above, but with 0.96x multiplier applied²
182,496


Final score after replay playback (lazer)³
196,099


Verdict:
Perhaps combo portion a little underestimated, but generally operating as designed.
Leups on A New Summer Adventure! by Yu-Peng Chen @HOYO-MiX [Salad] (1.60*)


Mods active
HR, FL


Accuracy
86.01%


Max combo
15x


Score V1
65,670


Standardised score (master)
591,250


Standardised score (this PR)
310,419


Final score after replay playback with score V2 active (stable)¹
564,286


The above, but with 0.96x multiplier applied²
541,715


Final score after replay playback (lazer)³
612,086


Verdict:
Well this is one of the truly unlucky ones.
Score got majorly hammered by the negative combo estimation.
Not sure it's worth doing much to salvage a C rank score.
fireyun06 on PADORU / PADORU by Turbo [SALAD] (1.99*)


Mods active
none


Accuracy
81.25%


Max combo
13x


Score V1
42,466


Standardised score (master)
335,383


Standardised score (this PR)
183,674


Final score after replay playback with score V2 active (stable)¹
401,383


The above, but with 0.96x multiplier applied²
385,327


Final score after replay playback (lazer)³
385,320


Verdict:
Another harsh hit from combo estimation, as above.
For rank D, seems like it's probably fine to keep as is, though.
Footnotes


This was achieved by binary modifying the .osr file to specify score V2 as an active mod for the replay, playing it back until the end, and observing the final total score. ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸ ↩⁹ ↩¹⁰ ↩¹¹ ↩¹² ↩¹³ ↩¹⁴ ↩¹⁵


lazer applies 0.96x multiplier due to classic mod being attached to stable scores, so this is applied onto stable total for comparison purposes. ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸ ↩⁹ ↩¹⁰ ↩¹¹ ↩¹² ↩¹³ ↩¹⁴


While this is provided for general reference to ensure that no score value from actual playback clearly deviates from the rest, the score value after lazer's replay playback may be inaccurate due to inaccuracies of the actual replay playback. Hit statistics after playing back a replay routinely do not match the statistics stored to the replay. ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸ ↩⁹ ↩¹⁰ ↩¹¹ ↩¹² ↩¹³ ↩¹⁴

Mods active	None
Accuracy	98.12%
Max combo	370x
Score V1	22,267,704
Standardised score (master)	913,335
Standardised score (this PR)	556,242
Final score after replay playback with score V2 active on stable¹	718,179
The above, but with 0.96x multiplier applied²	689,451
Final score after replay playback on lazer³	649,954

Mods active	SD, FL
Accuracy	23.61%
Max combo	26x (FC)
Score V1	14,519
Standardised score (master)	208,758
Standardised score (this PR)	742,467
Final score after replay playback with score V2 active (stable)¹	N/A (no replay)
Final score after replay playback (lazer)	N/A (no replay)

Mods active	HD
Accuracy	100.00%
Max combo	2498x (FC)
Score V1	195,830,244
Standardised score (master)	2,979,120
Standardised score (this PR)	8,255,109
Final score after replay playback with score V2 active (stable)¹	skipped
The above, but with 0.96x multiplier applied²	skipped
Final score after replay playback (lazer)³	skipped

Mods active	DT, EZ
Accuracy	34.88%
Max combo	3x
Score V1	109,976
Standardised score (master)	237,056
Standardised score (this PR)	796
Final score after replay playback with score V2 active (stable)¹	no replay
The above, but with 0.96x multiplier applied²	no replay
Final score after replay playback (lazer)³	no replay

Mods active	NF, HD, HR, DT, FL
Accuracy	31.32%
Max combo	7x
Score V1	361,982
Standardised score (master)	87,982
Standardised score (this PR)	306
Final score after replay playback with score V2 active (stable)¹	51,037
The above, but with 0.96x multiplier applied²	48,996
Final score after replay playback (lazer)³	31,992

Mods active	NF
Accuracy	0.00%
Max combo	0x
Score V1	53,900
Standardised score (master)	113,849
Standardised score (this PR)	599
Final score after replay playback with score V2 active (stable)¹	skipped
The above, but with 0.96x multiplier applied²	skipped
Final score after replay playback (lazer)³	4,704

Mods active	NF, HD, HR, DT
Accuracy	48.39%
Max combo	5x
Score V1	32,059
Standardised score (master)	121,564
Standardised score (this PR)	3,324
Final score after replay playback with score V2 active (stable)¹	~82,100
The above, but with 0.96x multiplier applied²	78,816
Final score after replay playback (lazer)³	51,466

Mods active	HR, DT, FL
Accuracy	100.00%
Max combo	187x (FC)
Score V1	630,351
Standardised score (master)	10,264,150
Standardised score (this PR)	1,471,369
Final score after replay playback with score V2 active (stable)¹	1,125,148
The above, but with 0.96x multiplier applied²	1,080,142
Final score after replay playback (lazer)³	740,140

Mods active	DT
Accuracy	16.42%
Max combo	6x
Score V1	121,882
Standardised score (master)	419,343
Standardised score (this PR)	109,360
Final score after replay playback with score V2 active (stable)¹	190,100
The above, but with 0.96x multiplier applied²	182,496
Final score after replay playback (lazer)³	196,099

Mods active	HR, FL
Accuracy	86.01%
Max combo	15x
Score V1	65,670
Standardised score (master)	591,250
Standardised score (this PR)	310,419
Final score after replay playback with score V2 active (stable)¹	564,286
The above, but with 0.96x multiplier applied²	541,715
Final score after replay playback (lazer)³	612,086